Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
like textinputformat - looking for a concrete implementation to read binary records from a flat file (that may be compressed).
it's assumed that hadoop can't split such a file. so the inputformat can set splittable to false.
tricky aspects are:
- how to know what class the file contains (has to be in a configuration somewhere).
- how to determine EOF (would be nice if hadoop can determine EOF and not have the deserializer throw an exception (which is hard to distinguish from a exception due to corruptions?)). this is easy for non-compressed streams - for compressed streams - DecompressorStream has a useful looking getAvailable() call - except the class is marked package private.
Attachments
Attachments
Issue Links
- blocks
-
HIVE-20 Hive should be able to create tables over binary flat files
- Open
- is blocked by
-
HADOOP-4192 Class <? extends T> Deserializer.getRealClass() method to return the actual class of the objects from a deserializer
- Closed
-
MAPREDUCE-376 Add serialization for Thrift
- Resolved