[HADOOP-54] SequenceFile should compress blocks, not individual entries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.2.0
Fix Version/s: 0.6.0
Component/s: io
Labels:
None

Description

SequenceFile will optionally compress individual values. But both compression and performance would be much better if sequences of keys and values are compressed together. Sync marks should only be placed between blocks. This will require some changes to MapFile too, so that all file positions stored there are the positions of blocks, not entries within blocks. Probably this can be accomplished by adding a getBlockStartPosition() method to SequenceFile.Writer.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

VIntCompressionResults.txt
20/Jul/06 10:01
6 kB
Arun Murthy
SequenceFilesII.patch
16/Aug/06 10:39
77 kB
Arun Murthy
SequenceFiles.patch
08/Aug/06 07:51
67 kB
Arun Murthy
SequenceFiles.final.patch
17/Aug/06 10:53
80 kB
Arun Murthy
SequenceFile20060824.tgz
24/Aug/06 09:30
17 kB
Arun Murthy
SequenceFile.updated.final.patch
18/Aug/06 10:40
84 kB
Arun Murthy
SequenceFile.20060822.tgz
22/Aug/06 11:31
17 kB
Arun Murthy
SequenceFile.20060821.perfomance.txt
21/Aug/06 17:07
4 kB
Arun Murthy
SequenceFile.20060821.patch
21/Aug/06 16:53
85 kB
Arun Murthy
enum-54.patch
25/Aug/06 21:31
85 kB
Owen O'Malley

Issue Links

is depended upon by

HADOOP-441 SequenceFile should support 'custom compressors'

Closed

Activity

People

Assignee:: Arun Murthy

Reporter:: Doug Cutting

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Feb/06 05:39

Updated:: 02/May/13 02:28

Resolved:: 28/Aug/06 19:23