Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.4.0
-
None
-
None
-
avro serialization
Description
Found this debugging Nutch.
MapTask serializes keys and values to the same stream, in pairs:
keySerializer.serialize(key);
.....
valSerializer.serialize(value);
.....
bb.write(b0, 0, 0);
AvroSerializer does not flush its buffer after each serialization. So if it is used for valSerializer, the values are only partially written or not written at all to the output stream before the record is marked as complete (the last line above).
<EDIT> Added HADOOP-10699_all.patch. This is a less intrusive fix, as it does not try to flush MapTask stream. Instead, we write serialized values directly to MapTask stream and avoid using a buffer on avro side.
Attachments
Attachments
Issue Links
- duplicates
-
HADOOP-11678 AvroSerializer buffers output in violation of contract for Serializer
- Patch Available