Details
-
Sub-task
-
Status: Patch Available
-
Blocker
-
Resolution: Unresolved
-
3.0.0-alpha1
-
None
-
None
Description
MapReduceClient's primitives.h attempts to provide optimised versions of standard library memory copy and comparison functions. It has been the subject of several portability-related bugs:
- HADOOP-11505 hadoop-mapreduce-client-nativetask uses bswap where be32toh is needed, doesn't work on non-x86
HADOOP-11665Provide and unify cross platform byteorder support in native code- MAPREDUCE-6397 MAPREDUCE makes many endian-dependent assumptions
HADOOP-11484hadoop-mapreduce-client-nativetask fails to build on ARM AARCH64 due to x86 asm statements
At present it only works on x86 and ARM64 as it lacks definitions for bswap and bswap64 for any platforms other than those.
However it has even more serious problems on non-x86 architectures, for example on SPARC simple_memcpy simply doesn't work at all:
$ cat bang.cc #include <string.h> #define SIMPLE_MEMCPY #include "primitives.h" int main(int argc, char **argv) { char b1[9]; char b2[9]; simple_memcpy(b2, b1, sizeof(b1)); } $ gcc -o bang bang.cc && ./bang Bus Error (core dumped)
That's because simple_memcpy does pointer fiddling that results in misaligned accesses, which are illegal on SPARC.
fmemcmp is also broken. Even if a definition of bswap is provided, on big-endian architectures the result is simply wrong because of its unconditional use of bswap:
$ cat thud.cc #include <stdio.h> #include <strings.h> #include "primitives.h" int main(int argc, char **argv) { char a[] = { 0,1,2,0 }; char b[] = { 0,2,1,0 }; printf("%lld %d\n", fmemcmp(a, b, sizeof(a), memcmp(a, b, sizeof(a)))); } $ g++ -o thud thud.cc && ./thud 65280 -1
And in addition fmemcmp suffers from the same misalignment issues as simple_memcpy and coredumps on SPARC when asked to compare odd-sized buffers.
primitives.h provides the following functions:
- bswap - used in 12 files in MRC but as HADOOP-11505 points out, mostly incorrectly as it takes no account of platform endianness
- bswap64 - used in 4 files in MRC, same comments as per bswap apply
- simple_memcpy - used in 3 files in MRC, should be replaced with the standard memcpy
- fmemcmp - used in 1 file, should be replaced with the standard memcmp
- fmemeq - used in 1 file, should be replaced with the standard memcmp
- frmemeq - not used at all, should just be removed
Summary: primitives.h should simply be deleted and replaced with the standard memory copy & compare functions, or with thin wrappers around them where the APIs are different.
Attachments
Attachments
Issue Links
- is related to
-
MAPREDUCE-6397 MAPREDUCE makes many endian-dependent assumptions
- Open
-
HADOOP-11505 Various native parts use bswap incorrectly and unportably
- Open
-
HADOOP-11665 Provide and unify cross platform byteorder support in native code
- Resolved
-
HADOOP-11484 hadoop-mapreduce-client-nativetask fails to build on ARM AARCH64 due to x86 asm statements
- Resolved