Avro
  1. Avro
  2. AVRO-682

Expose the DataFile's metadata entirely

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.1
    • Fix Version/s: 1.5.0
    • Component/s: java
    • Labels:
      None
    • Environment:

      Linux, Java 1.6

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Provide a way to list all metadata found in the Data Files.
    • Tags:
      java, data file, metadata

      Description

      Right now, the DataFileReader (DataFileStream actually) only allows one to query the meta data of a file by issuing a key. A user who does not know what metadata may be stored in the file, has no way to find out by getting a list/map of all there is. Perhaps we should provide a way for the user to retrieve global metadata info to query it back for values.

      Attached a patch (initial) that simply exposes the HashMap that contains the metadata after initialization of the data file reader.

      1. avro.metadata.datafile.r1.diff
        0.7 kB
        Harsh J
      2. avro.metadata.datafile.r1.diff
        0.7 kB
        Harsh J
      3. avro.metadata.datafile.r2.diff
        1 kB
        Harsh J
      4. avro.metadata.datafile.r3.diff
        2 kB
        Harsh J

        Activity

        Harsh J created issue -
        Hide
        Harsh J added a comment -

        Patch that adds a method such that the entire metadata map may be recovered via DataFileReader.

        Show
        Harsh J added a comment - Patch that adds a method such that the entire metadata map may be recovered via DataFileReader.
        Harsh J made changes -
        Field Original Value New Value
        Attachment avro.metadata.datafile.r1.diff [ 12457925 ]
        Hide
        Harsh J added a comment -

        Oops, bad doc-comment. Fixed in this re-up.

        Show
        Harsh J added a comment - Oops, bad doc-comment. Fixed in this re-up.
        Harsh J made changes -
        Attachment avro.metadata.datafile.r1.diff [ 12457927 ]
        Hide
        Harsh J added a comment -

        Alternative patch that gives a mere list of keys to use. I guess it'd come with some reserved Avro DF keys also, which prevents me from writing a size or element-compare test case for the same method (as I do not know how many/what all reserved keys get added – I think codec is one).

        But the list should be usable, I guess. People could ignore "avro." stuff. Or should we filter those out?

        Show
        Harsh J added a comment - Alternative patch that gives a mere list of keys to use. I guess it'd come with some reserved Avro DF keys also, which prevents me from writing a size or element-compare test case for the same method (as I do not know how many/what all reserved keys get added – I think codec is one). But the list should be usable, I guess. People could ignore "avro." stuff. Or should we filter those out?
        Harsh J made changes -
        Attachment avro.metadata.datafile.r2.diff [ 12457935 ]
        Hide
        Doug Cutting added a comment -

        This looks great. One addition: can we make the list umodifiable? After the values are all added, we can call

         
          metaKeyList = Collections.unmodifiableList(metaKeyList);
        

        Also, we should add some tests that call this new method.

        Show
        Doug Cutting added a comment - This looks great. One addition: can we make the list umodifiable? After the values are all added, we can call metaKeyList = Collections.unmodifiableList(metaKeyList); Also, we should add some tests that call this new method.
        Hide
        Doug Cutting added a comment -

        Also, I think it's fine to expose the "avro." keys.

        Show
        Doug Cutting added a comment - Also, I think it's fine to expose the "avro." keys.
        Hide
        Harsh J added a comment -

        Updated patch to reflect the unmodifiable change. And one simple test case in TestDataFileMeta checking if the keys list contains the key whose value is sought.

        Any more tests required?

        Show
        Harsh J added a comment - Updated patch to reflect the unmodifiable change. And one simple test case in TestDataFileMeta checking if the keys list contains the key whose value is sought. Any more tests required?
        Harsh J made changes -
        Attachment avro.metadata.datafile.r3.diff [ 12458042 ]
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Harsh.

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Harsh.
        Doug Cutting made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Patrick Linehan added a comment -

        Thanks, guys! Looks great.

        Show
        Patrick Linehan added a comment - Thanks, guys! Looks great.
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        2d 5h 42m 1 Doug Cutting 26/Oct/10 21:33
        Resolved Resolved Closed Closed
        136d 3h 59m 1 Doug Cutting 12/Mar/11 00:32

          People

          • Assignee:
            Harsh J
            Reporter:
            Harsh J
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1m
              1m
              Remaining:
              Remaining Estimate - 1m
              1m
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development