Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-682

Expose the DataFile's metadata entirely

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.1
    • Fix Version/s: 1.5.0
    • Component/s: java
    • Labels:
      None
    • Environment:

      Linux, Java 1.6

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Provide a way to list all metadata found in the Data Files.
    • Tags:
      java, data file, metadata

      Description

      Right now, the DataFileReader (DataFileStream actually) only allows one to query the meta data of a file by issuing a key. A user who does not know what metadata may be stored in the file, has no way to find out by getting a list/map of all there is. Perhaps we should provide a way for the user to retrieve global metadata info to query it back for values.

      Attached a patch (initial) that simply exposes the HashMap that contains the metadata after initialization of the data file reader.

      1. avro.metadata.datafile.r3.diff
        2 kB
        Harsh J
      2. avro.metadata.datafile.r2.diff
        1 kB
        Harsh J
      3. avro.metadata.datafile.r1.diff
        0.7 kB
        Harsh J
      4. avro.metadata.datafile.r1.diff
        0.7 kB
        Harsh J

        Activity

        Hide
        plinehan Patrick Linehan added a comment -

        Thanks, guys! Looks great.

        Show
        plinehan Patrick Linehan added a comment - Thanks, guys! Looks great.
        Hide
        cutting Doug Cutting added a comment -

        I just committed this. Thanks, Harsh.

        Show
        cutting Doug Cutting added a comment - I just committed this. Thanks, Harsh.
        Hide
        qwertymaniac Harsh J added a comment -

        Updated patch to reflect the unmodifiable change. And one simple test case in TestDataFileMeta checking if the keys list contains the key whose value is sought.

        Any more tests required?

        Show
        qwertymaniac Harsh J added a comment - Updated patch to reflect the unmodifiable change. And one simple test case in TestDataFileMeta checking if the keys list contains the key whose value is sought. Any more tests required?
        Hide
        cutting Doug Cutting added a comment -

        Also, I think it's fine to expose the "avro." keys.

        Show
        cutting Doug Cutting added a comment - Also, I think it's fine to expose the "avro." keys.
        Hide
        cutting Doug Cutting added a comment -

        This looks great. One addition: can we make the list umodifiable? After the values are all added, we can call

         
          metaKeyList = Collections.unmodifiableList(metaKeyList);
        

        Also, we should add some tests that call this new method.

        Show
        cutting Doug Cutting added a comment - This looks great. One addition: can we make the list umodifiable? After the values are all added, we can call metaKeyList = Collections.unmodifiableList(metaKeyList); Also, we should add some tests that call this new method.
        Hide
        qwertymaniac Harsh J added a comment -

        Alternative patch that gives a mere list of keys to use. I guess it'd come with some reserved Avro DF keys also, which prevents me from writing a size or element-compare test case for the same method (as I do not know how many/what all reserved keys get added – I think codec is one).

        But the list should be usable, I guess. People could ignore "avro." stuff. Or should we filter those out?

        Show
        qwertymaniac Harsh J added a comment - Alternative patch that gives a mere list of keys to use. I guess it'd come with some reserved Avro DF keys also, which prevents me from writing a size or element-compare test case for the same method (as I do not know how many/what all reserved keys get added – I think codec is one). But the list should be usable, I guess. People could ignore "avro." stuff. Or should we filter those out?
        Hide
        qwertymaniac Harsh J added a comment -

        Oops, bad doc-comment. Fixed in this re-up.

        Show
        qwertymaniac Harsh J added a comment - Oops, bad doc-comment. Fixed in this re-up.
        Hide
        qwertymaniac Harsh J added a comment -

        Patch that adds a method such that the entire metadata map may be recovered via DataFileReader.

        Show
        qwertymaniac Harsh J added a comment - Patch that adds a method such that the entire metadata map may be recovered via DataFileReader.

          People

          • Assignee:
            qwertymaniac Harsh J
            Reporter:
            qwertymaniac Harsh J
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1m
              1m
              Remaining:
              Remaining Estimate - 1m
              1m
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development