Hadoop Common
  1. Hadoop Common
  2. HADOOP-54

SequenceFile should compress blocks, not individual entries

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.6.0
    • Component/s: io
    • Labels:
      None

      Description

      SequenceFile will optionally compress individual values. But both compression and performance would be much better if sequences of keys and values are compressed together. Sync marks should only be placed between blocks. This will require some changes to MapFile too, so that all file positions stored there are the positions of blocks, not entries within blocks. Probably this can be accomplished by adding a getBlockStartPosition() method to SequenceFile.Writer.

      1. enum-54.patch
        85 kB
        Owen O'Malley
      2. SequenceFile.20060821.patch
        85 kB
        Arun C Murthy
      3. SequenceFile.20060821.perfomance.txt
        4 kB
        Arun C Murthy
      4. SequenceFile.20060822.tgz
        17 kB
        Arun C Murthy
      5. SequenceFile.updated.final.patch
        84 kB
        Arun C Murthy
      6. SequenceFile20060824.tgz
        17 kB
        Arun C Murthy
      7. SequenceFiles.final.patch
        80 kB
        Arun C Murthy
      8. SequenceFiles.patch
        67 kB
        Arun C Murthy
      9. SequenceFilesII.patch
        77 kB
        Arun C Murthy
      10. VIntCompressionResults.txt
        6 kB
        Arun C Murthy

        Issue Links

          Activity

          Gavin made changes -
          Link This issue is depended upon by HADOOP-441 [ HADOOP-441 ]
          Gavin made changes -
          Link This issue blocks HADOOP-441 [ HADOOP-441 ]
          Doug Cutting made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Doug Cutting made changes -
          Resolution Fixed [ 1 ]
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Owen O'Malley made changes -
          Attachment enum-54.patch [ 12339610 ]
          Arun C Murthy made changes -
          Attachment SequenceFile20060824.tgz [ 12339470 ]
          Arun C Murthy made changes -
          Attachment SequenceFile.20060822.tgz [ 12339301 ]
          Arun C Murthy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Arun C Murthy made changes -
          Attachment SequenceFile.20060821.perfomance.txt [ 12339258 ]
          Arun C Murthy made changes -
          Attachment SequenceFile.20060821.patch [ 12339256 ]
          Doug Cutting made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Arun C Murthy made changes -
          Attachment SequenceFile.updated.final.patch [ 12339083 ]
          Arun C Murthy made changes -
          Link This issue blocks HADOOP-441 [ HADOOP-441 ]
          Arun C Murthy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Arun C Murthy made changes -
          Attachment SequenceFiles.final.patch [ 12339015 ]
          Arun C Murthy made changes -
          Attachment SequenceFilesII.patch [ 12338946 ]
          Arun C Murthy made changes -
          Attachment SequenceFiles.patch [ 12338351 ]
          Doug Cutting made changes -
          Fix Version/s 0.5.0 [ 12311939 ]
          Fix Version/s 0.6.0 [ 12312025 ]
          Doug Cutting made changes -
          Workflow no-reopen-closed [ 12373223 ] no-reopen-closed, patch-avail [ 12377426 ]
          Arun C Murthy made changes -
          Attachment VIntCompressionResults.txt [ 12337231 ]
          Arun C Murthy made changes -
          Assignee Michel Tourn [ michel_tourn ] Arun C Murthy [ acmurthy ]
          Doug Cutting made changes -
          Fix Version/s 0.4.0 [ 12311021 ]
          Fix Version/s 0.5.0 [ 12311939 ]
          Doug Cutting made changes -
          Workflow no reopen closed [ 12372891 ] no-reopen-closed [ 12373223 ]
          Doug Cutting made changes -
          Workflow jira [ 12348581 ] no reopen closed [ 12372891 ]
          eric baldeschwieler made changes -
          Fix Version/s 0.3 [ 12310930 ]
          Fix Version/s 0.4 [ 12311021 ]
          Doug Cutting made changes -
          Field Original Value New Value
          Fix Version/s 0.3 [ 12310930 ]
          Fix Version/s 0.2 [ 12310813 ]
          Doug Cutting created issue -

            People

            • Assignee:
              Arun C Murthy
              Reporter:
              Doug Cutting
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development