Hadoop Common
  1. Hadoop Common
  2. HADOOP-732

SequenceFile's header should allow to store metadata in the form of key/value pairs

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: io
    • Labels:
      None

      Description

      The sequence file currently stores a fixed list of metadata attributes, such as key/value class names,
      compression method, etc. To make sequence file more self descriptable, it should allow to store a list of key/value pairs. One particular attribute of interest is to indicate whether the key/value classes are actually hadoop record classes,
      if so, store the DDls for the records. This way, we may create tools to extract DDl from a sequence file and
      then generate necessary classes. It also make it possible to provide an interpretive version of Hadoop record.
      This way, even in the situation where Hadoop or the application does not have the necessary classes,
      a sequence file of Hadoop records can be read and deserialized "interpretively".

      1. seqFileMetadata.patch
        32 kB
        Runping Qi
      2. seqFileMetadata.patch.2
        24 kB
        Runping Qi

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        68d 7h 44m 1 Runping Qi 24/Jan/07 08:05
        Patch Available Patch Available Open Open
        1d 14h 37m 1 Doug Cutting 25/Jan/07 22:42
        Open Open Resolved Resolved
        23h 29m 1 Doug Cutting 26/Jan/07 22:12
        Resolved Resolved Closed Closed
        7d 5h 10m 1 Doug Cutting 03/Feb/07 03:23
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Doug Cutting made changes -
        Resolution Fixed [ 1 ]
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.11.0 [ 12312257 ]
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Runping!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Runping!
        Runping Qi made changes -
        Attachment seqFileMetadata.patch.2 [ 12349665 ]
        Hide
        Runping Qi added a comment -

        Knock off a few createWriter methods.
        seqFileMetadata.patch.2 contains the new patch.

        Show
        Runping Qi added a comment - Knock off a few createWriter methods. seqFileMetadata.patch.2 contains the new patch.
        Doug Cutting made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Doug Cutting added a comment -

        This looks good. My only concern is that it adds yet more SequenceFile.createWriter() signatures. Until HADOOP-938 is resolved, I'd prefer this only add a single new createWriter() signature, one that includes all possible options, including this new option.

        Show
        Doug Cutting added a comment - This looks good. My only concern is that it adds yet more SequenceFile.createWriter() signatures. Until HADOOP-938 is resolved, I'd prefer this only add a single new createWriter() signature, one that includes all possible options, including this new option.
        Hide
        Hadoop QA added a comment -

        +1, because http://issues.apache.org/jira/secure/attachment/12349495/seqFileMetadata.patch applied and successfully tested against trunk revision r499156.

        Show
        Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12349495/seqFileMetadata.patch applied and successfully tested against trunk revision r499156.
        Runping Qi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Runping Qi made changes -
        Attachment seqFileMetadata.patch [ 12349495 ]
        Hide
        Runping Qi added a comment -

        Reattach the patch

        Show
        Runping Qi added a comment - Reattach the patch
        Runping Qi made changes -
        Attachment seqFileMetadata.patch [ 12349494 ]
        Runping Qi made changes -
        Attachment seqFileMetadata.patch [ 12349494 ]
        Hide
        Runping Qi added a comment -

        Attached is a patch for this issue.

        SequenceFile has a new header — a TreeMap<Text, Text> object wrapped in a class, Metadata, implementing Writable interface. To accomodate this, the version number is bumped up to 6.

        The Reader class has a new member variable for the metadata. A method is also added for returning the metadata object. The new code can read the files of old versions.

        New constructors of various Writer classes are added to take a metadata object as their last parameter. New createWriter static functions with metadata as the last
        parameter are also introduced. They are all backward compatible. A new unit test is added to TestSequenceFile for testing writing/reading sequence files with metadata.
        All unit tests passed.

        Show
        Runping Qi added a comment - Attached is a patch for this issue. SequenceFile has a new header — a TreeMap<Text, Text> object wrapped in a class, Metadata, implementing Writable interface. To accomodate this, the version number is bumped up to 6. The Reader class has a new member variable for the metadata. A method is also added for returning the metadata object. The new code can read the files of old versions. New constructors of various Writer classes are added to take a metadata object as their last parameter. New createWriter static functions with metadata as the last parameter are also introduced. They are all backward compatible. A new unit test is added to TestSequenceFile for testing writing/reading sequence files with metadata. All unit tests passed.
        Runping Qi made changes -
        Assignee Owen O'Malley [ owen.omalley ] Runping Qi [ runping ]
        Doug Cutting made changes -
        Assignee Owen O'Malley [ owen.omalley ]
        Doug Cutting made changes -
        Field Original Value New Value
        Component/s io [ 12310687 ]
        Hide
        Runping Qi added a comment -

        Strings sgould be fine.

        Show
        Runping Qi added a comment - Strings sgould be fine.
        Hide
        Owen O'Malley added a comment -

        Would the meta-data key/value pairs be required to be strings? That would simplify everything. It isn't clear what the right API for this would be.

        Show
        Owen O'Malley added a comment - Would the meta-data key/value pairs be required to be strings? That would simplify everything. It isn't clear what the right API for this would be.
        Runping Qi created issue -

          People

          • Assignee:
            Runping Qi
            Reporter:
            Runping Qi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development