Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.1.0, 1.1.1, 1.2.0, 1.1.3, 1.2.1, 1.2.2
-
None
-
None
Description
The custom RecordReader class in MultiFileWordCount (MultiFileLineRecordReader) has been replaced in newer examples with a better implementation which uses the CombineFileInputFormat, which doesn't feature this bug. However, this bug nevertheless still exists in 1.x versions of the MultiFileWordCount which rely on the mapred API.
The older MultiFileWordCount implementation defines the getPos() as follows:
long currentOffset = currentStream == null ? 0 : currentStream.getPos();
...
This is meant to prevent errors when underlying stream is null. But it doesn't gaurantee to work: The RawLocalFileSystem, for example, currectly will close the underlying file stream once it is consumed, and the currentStream will thus throw a NullPointerException when trying to access the null stream.
This is only seen when running this in the context where the MapTask class, which is only relevant in mapred.* API, calls getPos() twice in tandem, before and after reading a record.
This custom record reader should be gaurded, or else eliminated, since it assumes something which is not in the FileSystem contract: That a getPos will always return a integral value.