Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-54

SequenceFile should compress blocks, not individual entries

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.2.0
    • 0.6.0
    • io
    • None

    Description

      SequenceFile will optionally compress individual values. But both compression and performance would be much better if sequences of keys and values are compressed together. Sync marks should only be placed between blocks. This will require some changes to MapFile too, so that all file positions stored there are the positions of blocks, not entries within blocks. Probably this can be accomplished by adding a getBlockStartPosition() method to SequenceFile.Writer.

      Attachments

        1. VIntCompressionResults.txt
          6 kB
          Arun Murthy
        2. SequenceFilesII.patch
          77 kB
          Arun Murthy
        3. SequenceFiles.patch
          67 kB
          Arun Murthy
        4. SequenceFiles.final.patch
          80 kB
          Arun Murthy
        5. SequenceFile20060824.tgz
          17 kB
          Arun Murthy
        6. SequenceFile.updated.final.patch
          84 kB
          Arun Murthy
        7. SequenceFile.20060822.tgz
          17 kB
          Arun Murthy
        8. SequenceFile.20060821.perfomance.txt
          4 kB
          Arun Murthy
        9. SequenceFile.20060821.patch
          85 kB
          Arun Murthy
        10. enum-54.patch
          85 kB
          Owen O'Malley

        Issue Links

          Activity

            People

              acmurthy Arun Murthy
              cutting Doug Cutting
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: