Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
I tried to use the InputSampler on a SequenceFile<Text,Text> and found that it comes up with duplicate keys in the sample. The problem was tracked down to the fact that the Text object returned from the reader is essentially a wrapper pointing to a byte array, which changes as the sequence file reader progresses. There was also a bug in that the reader should be initialized before the use. The am attaching a patch that fixes both of the issues. --Alex K
Attachments
Attachments
Issue Links
- is related to
-
MAPREDUCE-366 Change org.apache.hadoop.mapred.lib.TotalOrderPartitioner to use new api
- Resolved
-
MAPREDUCE-5225 SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits
- Patch Available