Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-54

SequenceFile should compress blocks, not individual entries

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.6.0
    • Component/s: io
    • Labels:
      None

      Description

      SequenceFile will optionally compress individual values. But both compression and performance would be much better if sequences of keys and values are compressed together. Sync marks should only be placed between blocks. This will require some changes to MapFile too, so that all file positions stored there are the positions of blocks, not entries within blocks. Probably this can be accomplished by adding a getBlockStartPosition() method to SequenceFile.Writer.

        Attachments

        1. enum-54.patch
          85 kB
          Owen O'Malley
        2. SequenceFile20060824.tgz
          17 kB
          Arun C Murthy
        3. SequenceFile.20060822.tgz
          17 kB
          Arun C Murthy
        4. SequenceFile.20060821.perfomance.txt
          4 kB
          Arun C Murthy
        5. SequenceFile.20060821.patch
          85 kB
          Arun C Murthy
        6. SequenceFile.updated.final.patch
          84 kB
          Arun C Murthy
        7. SequenceFiles.final.patch
          80 kB
          Arun C Murthy
        8. SequenceFilesII.patch
          77 kB
          Arun C Murthy
        9. SequenceFiles.patch
          67 kB
          Arun C Murthy
        10. VIntCompressionResults.txt
          6 kB
          Arun C Murthy

          Issue Links

            Activity

              People

              • Assignee:
                acmurthy Arun C Murthy
                Reporter:
                cutting Doug Cutting
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: