[HADOOP-106] Data blocks should be record-oriented. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Wish
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 0.2.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

If data blocks were starting and ending on data record boundaries, and not in random places within a file, it would give some important advantages:

it would be possible to avoid "fishing" for the beginning of first record in a split (see SequenceFile.Reader.sync()).

it would make recovering from DFS errors much more successful and easier - in most cases missing blocks could be just skipped and the remaining parts combined together.

Attachments

Issue Links

relates to

HADOOP-7404 Data Blocks Spliting should be record oriented or provided option for give the spliting locations (offsets) as input file

Open

Activity

People

Assignee:: Sameer Paranjpye

Reporter:: Andrzej Bialecki

Votes:: 2 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 26/Mar/06 04:21

Updated:: 19/Jun/11 19:48

Resolved:: 23/Oct/07 22:13