Details
-
Wish
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
0.2.0
-
None
-
None
-
None
Description
If data blocks were starting and ending on data record boundaries, and not in random places within a file, it would give some important advantages:
- it would be possible to avoid "fishing" for the beginning of first record in a split (see SequenceFile.Reader.sync()).
- it would make recovering from DFS errors much more successful and easier - in most cases missing blocks could be just skipped and the remaining parts combined together.
Attachments
Issue Links
- relates to
-
HADOOP-7404 Data Blocks Spliting should be record oriented or provided option for give the spliting locations (offsets) as input file
- Open