Avro
  1. Avro
  2. AVRO-804

Java: add ThriftDatumWriter and ThriftDatumReader

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: java
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Add a ThriftDatumWriter that accepts Thrift-generated classes and writes them in Avro format, and a ThriftDatumReader that reads Avro data into a Thrift-generated class. This would permit storage of Thrift-generated data structures in Avro data files.

      1. AVRO-804.patch
        144 kB
        Doug Cutting
      2. AVRO-804.patch
        141 kB
        Doug Cutting
      3. AVRO-804.patch
        139 kB
        Doug Cutting
      4. AVRO-804.patch
        139 kB
        Doug Cutting
      5. AVRO-804.patch
        139 kB
        Doug Cutting
      6. AVRO-804.patch
        133 kB
        Doug Cutting

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          Thrift-generated classes extend TBase, which permits efficient getting and setting of field values. The list of Thrift fields can be obtained from static variables in the generated class.

          Show
          Doug Cutting added a comment - Thrift-generated classes extend TBase, which permits efficient getting and setting of field values. The list of Thrift fields can be obtained from static variables in the generated class.
          Hide
          Raghu Angadi added a comment -

          elephant-bird does Thrift <-> PIG tuple/schema (ThriftToPig.java etc in https://github.com/kevinweil/elephant-bird). We can reuse quite a bit of code here. I can implement it if it seems like a good idea. at the same time, I will get more familiar with Avro.

          Show
          Raghu Angadi added a comment - elephant-bird does Thrift <-> PIG tuple/schema (ThriftToPig.java etc in https://github.com/kevinweil/elephant-bird ). We can reuse quite a bit of code here. I can implement it if it seems like a good idea. at the same time, I will get more familiar with Avro.
          Hide
          Doug Cutting added a comment -

          Raghu, that'd be great if you wanted to take a stab at this. I'd be happy to collaborate.

          Some initial thoughts:

          • We can cache the Class<extends TBase> -> Schema mapping in a static ConcurrentHashMap.
          • We might use schema properties for types that don't map exactly, e.g., Thrift's i16 type could be mapped to the Avro schema {"type":"int", "thrift":"i16"}

            . Dunno if that'll be required, though.

          • Thrift field numbers can be saved as Avro schema field properties, so that, when reading into a Thrift-generated class we'd set values by number, not name, since that's Thrift semantics.
          • The implementation should subclass GenericData, GenericDatumReader & Writer, overriding methods like getField & setField. Look at the specific and reflect subclasses of these for examples.
          • ThriftDatumReader should probably have an Schema -> int[] cache for record schemas, where these arrays map from Avro field positions (as passed to setField) to Thrift field numbers, passed to TBase#setFieldValue().
          Show
          Doug Cutting added a comment - Raghu, that'd be great if you wanted to take a stab at this. I'd be happy to collaborate. Some initial thoughts: We can cache the Class<extends TBase> -> Schema mapping in a static ConcurrentHashMap. We might use schema properties for types that don't map exactly, e.g., Thrift's i16 type could be mapped to the Avro schema {"type":"int", "thrift":"i16"} . Dunno if that'll be required, though. Thrift field numbers can be saved as Avro schema field properties, so that, when reading into a Thrift-generated class we'd set values by number, not name, since that's Thrift semantics. The implementation should subclass GenericData, GenericDatumReader & Writer, overriding methods like getField & setField. Look at the specific and reflect subclasses of these for examples. ThriftDatumReader should probably have an Schema -> int[] cache for record schemas, where these arrays map from Avro field positions (as passed to setField) to Thrift field numbers, passed to TBase#setFieldValue().
          Hide
          Raghu Angadi added a comment -

          Thanks Doug.
          I will work on an implementation. Please feel free to assign this to me.

          Show
          Raghu Angadi added a comment - Thanks Doug. I will work on an implementation. Please feel free to assign this to me.
          Hide
          Doug Cutting added a comment -

          Thanks, Raghu, for taking this on.

          Show
          Doug Cutting added a comment - Thanks, Raghu, for taking this on.
          Hide
          Raghu Angadi added a comment -

          sorry for the delay. will work on the jira this week.

          Show
          Raghu Angadi added a comment - sorry for the delay. will work on the jira this week.
          Hide
          Doug Cutting added a comment -

          Raghu, are you going to have time to work on this? If not, I may have some soon. Or I could work on AVRO-805...

          Show
          Doug Cutting added a comment - Raghu, are you going to have time to work on this? If not, I may have some soon. Or I could work on AVRO-805 ...
          Hide
          Raghu Angadi added a comment -

          surely going to make time over next couple of days. I will update here soon. Thanks Doug.

          Show
          Raghu Angadi added a comment - surely going to make time over next couple of days. I will update here soon. Thanks Doug.
          Hide
          Doug Cutting added a comment -

          Raghu, I had some time, so I started working on this. Here's an early version.

          I include the generated Thrift test files in svn so that tests can still be run by folks who don't have the Thrift compiler installed.

          This compiles but does not yet have tests. It's a work in progress. I'll add some tests and start debugging it over the next few days, hopefully committing it early next week.

          Show
          Doug Cutting added a comment - Raghu, I had some time, so I started working on this. Here's an early version. I include the generated Thrift test files in svn so that tests can still be run by folks who don't have the Thrift compiler installed. This compiles but does not yet have tests. It's a work in progress. I'll add some tests and start debugging it over the next few days, hopefully committing it early next week.
          Hide
          Raghu Angadi added a comment -

          Thanks Doug. ThriftData implementation looks simpler than I thought. I am going to play with this code and write tests similar to Protobuf.

          • for Thrift 0.5 support we might need to handle couple of quirks (absence of isBuffer() etc). I can add those if required.

          > return (Map<TFieldIdEnum,FieldMetaData>)c.getField("metaDataMap").get(null);
          FieldMetaData.getStructMetaDataMap(c) returns this map.

          Show
          Raghu Angadi added a comment - Thanks Doug. ThriftData implementation looks simpler than I thought. I am going to play with this code and write tests similar to Protobuf. for Thrift 0.5 support we might need to handle couple of quirks (absence of isBuffer() etc). I can add those if required. > return (Map<TFieldIdEnum,FieldMetaData>)c.getField("metaDataMap").get(null); FieldMetaData.getStructMetaDataMap(c) returns this map.
          Hide
          Doug Cutting added a comment -

          Here's a version with tests that pass.

          I'll commit this soon if there are no objections.

          Show
          Doug Cutting added a comment - Here's a version with tests that pass. I'll commit this soon if there are no objections.
          Hide
          Doug Cutting added a comment -

          Sorry, Raghu, I missed your comment above.

          Yes, please test this against earlier versions of Thrift if you can, as I have only tested it against 0.7.0 and may rely on features of that version.

          Show
          Doug Cutting added a comment - Sorry, Raghu, I missed your comment above. Yes, please test this against earlier versions of Thrift if you can, as I have only tested it against 0.7.0 and may rely on features of that version.
          Hide
          Doug Cutting added a comment -

          Updated to use FieldMetaData.getStructMetaDataMap as suggested by Raghu. Thanks!

          The reflection stuff in getFieldIds() is ugly too and could probably be replaced by something that calls this too.

          Show
          Doug Cutting added a comment - Updated to use FieldMetaData.getStructMetaDataMap as suggested by Raghu. Thanks! The reflection stuff in getFieldIds() is ugly too and could probably be replaced by something that calls this too.
          Hide
          Doug Cutting added a comment -

          Cleaned up the code in getFieldIds to no longer use reflection.

          Show
          Doug Cutting added a comment - Cleaned up the code in getFieldIds to no longer use reflection.
          Hide
          Doug Cutting added a comment -

          Here's a version that includes a package.html for the javadoc.

          Show
          Doug Cutting added a comment - Here's a version that includes a package.html for the javadoc.
          Hide
          Raghu Angadi added a comment -

          +1. The test directory has only thrift and thrift generated files. Not sure if you missed a test file.

          I haven't managed to my OS X build going yet. I am going to try on ubuntu. will file a follow up jira for Thrift 5, i think it will be a minor patch.

          Thanks Doug.

          Show
          Raghu Angadi added a comment - +1. The test directory has only thrift and thrift generated files. Not sure if you missed a test file. I haven't managed to my OS X build going yet. I am going to try on ubuntu. will file a follow up jira for Thrift 5, i think it will be a minor patch. Thanks Doug.
          Hide
          Doug Cutting added a comment -

          Oops. I forgot to include TestThrift.java. Thanks for catching that! Here's a new version of the patch that includes the test.

          Show
          Doug Cutting added a comment - Oops. I forgot to include TestThrift.java. Thanks for catching that! Here's a new version of the patch that includes the test.
          Hide
          Raghu Angadi added a comment -

          +1

          Show
          Raghu Angadi added a comment - +1
          Hide
          Doug Cutting added a comment -

          I committed this.

          Show
          Doug Cutting added a comment - I committed this.

            People

            • Assignee:
              Doug Cutting
              Reporter:
              Doug Cutting
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development