Avro
  1. Avro
  2. AVRO-1500

Unknown datum type exception during union type resolution (no short to int conversion).

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.4, 1.7.5, 1.7.6
    • Fix Version/s: 1.7.7
    • Component/s: None
    • Labels:
    • Environment:

      java API

      Description

      There is a conversion for values of type short (and other numeric types) if in the schema they are declared as int:

      /lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumWriter.java
       protected void write(Schema schema, Object datum, Encoder out)
          throws IOException {
          try {
            switch (schema.getType()) {
              // ...
              case INT:     out.writeInt(((Number)datum).intValue()); break;
              // ...
      

      So, if a value of short type is passed to INT field, it will be converted and saved as INT in avro.

      But, when there is next field in schema:

       ["null",{"type":"int","thrift":"short"}] 

      which is a union with int in it, and short is passed in (lets say 5), then we are having the next exception:

      org.apache.avro.AvroRuntimeException: Unknown datum type: 5
      	at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:593)
      	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:558)
      	at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:144) 
      

      This happens because in org.apache.avro.generic.GenericData there is no check if the passed object has a type of java.lang.Short, and it is not get converted then in write method of GenericDatumWriter:

      lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
      /** Return the schema full name for a datum.  Called by {@link
         * #resolveUnion(Schema,Object)}. */
      protected String getSchemaName(Object datum) {
      /* ... */
      if (isInteger(datum))
        return Type.INT.getName();
      if (isLong(datum))
        return Type.LONG.getName();
       if (isFloat(datum))
       return Type.FLOAT.getName();
      if (isDouble(datum))
        return Type.DOUBLE.getName();
       if (isBoolean(datum))
        return Type.BOOLEAN.getName();
      throw new AvroRuntimeException("Unknown datum type: "+datum);
      

      This error initially occured during thrift to avro conversion, when thrift obj had optional field of type i16.
      In thrift to avro schema converter, if the type is short in thrift (i16) it will be implicitly converted to int in avro-schema, so the values should be converted as well. This is already done if they are not in the union (not optional). Otherwise the exception is thrown.
      The snippet from schema conversion code is below:

      lang/java/thrift/src/main/java/org/apache/avro/thrift/ThriftData.java
      private Schema getSchema(FieldValueMetaData f) {
        switch (f.type) {
        /* ... */ 
          case TType.I16:
            Schema s = Schema.create(Schema.Type.INT);
            s.addProp(THRIFT_PROP, "short");
          return s;
         /* ... */
      

      Proposal is to add isShort check to generic data, as well as isShort method implementation:

      lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
      
      protected String getSchemaName(Object datum) {
      // ..
      if (isInteger(datum) || isShort(datum))
         return Type.INT.getName();
      // ..
      

      or maybe even some kind of isNumeric method, so the behaviour will be same for INT fields and INT fields that are in Union.

      1. AVRO-1500.patch
        1 kB
        Michael Pershyn
      2. AVRO-1500-unit-test.diff
        11 kB
        Michael Pershyn
      3. AVRO-1500-ThriftData.patch
        0.8 kB
        Michael Pershyn
      4. AVRO-1500-README.patch
        0.4 kB
        Michael Pershyn

        Activity

        Hide
        Michael Pershyn added a comment -

        Added a patch that have fixed this problem in our case. With this patch the conversion of thrift-objects with optional int16 fields is correct.
        Also, isNumber check was not added because Long is a different type, so only two cases should be checked.
        Please review

        Show
        Michael Pershyn added a comment - Added a patch that have fixed this problem in our case. With this patch the conversion of thrift-objects with optional int16 fields is correct. Also, isNumber check was not added because Long is a different type, so only two cases should be checked. Please review
        Hide
        Michael Pershyn added a comment -

        I have used git to generate the patch with command:
        `git --no-pager diff --no-prefix`

        So I am not sure with svn compatibility, but It should be applicable with `patch -p0 < AVRO-1500.diff`

        Show
        Michael Pershyn added a comment - I have used git to generate the patch with command: `git --no-pager diff --no-prefix` So I am not sure with svn compatibility, but It should be applicable with `patch -p0 < AVRO-1500 .diff`
        Hide
        Michael Pershyn added a comment -

        Also I have used latest revision from today that was on github, so patch should be applicable to trunk.
        c754127 - (6 weeks ago) AVRO-1402. Add optional subtypes to specification... (trunk)

        Show
        Michael Pershyn added a comment - Also I have used latest revision from today that was on github, so patch should be applicable to trunk. c754127 - (6 weeks ago) AVRO-1402 . Add optional subtypes to specification... (trunk)
        Hide
        Doug Cutting added a comment -

        Might we just override getSchemaName() in ThriftData to test for short?

        This might look something like:

        @Override protected String getSchemaName(Object datum) {
          if (datum instanceof Short)
            return Type.INT.getName();
          return super.getSchemaName(datum);
        }
        

        Also, we should add a test that fails without this patch to TestThrift.java.

        Show
        Doug Cutting added a comment - Might we just override getSchemaName() in ThriftData to test for short? This might look something like: @Override protected String getSchemaName( Object datum) { if (datum instanceof Short ) return Type.INT.getName(); return super .getSchemaName(datum); } Also, we should add a test that fails without this patch to TestThrift.java.
        Hide
        Michael Pershyn added a comment -

        Intially I was looking forward to make this test in avro proj, but haven't found where to put it.
        ThriftData looks a good place, so I'm on it, but have some questions:

        • There is environment dependency - current test is compiled with thrift v0.7 dependency, so I had to fire up a vagrant environment that has all the necessary tools for thrift (provisioning to install thrift 0.7, java, maven). In my dev environment I have thrift 0.9.1, so I found virtual env really useful way to avoid versions clash, but have doubts even to suggest to contribute that, because it is specific version of linux only... So I need some advice here
        • On my dev environment test fail on clear checkout from trunk, when compiling Apache Avro IPC:
          Failed tests: testBasicMaxSpans(org.apache.avro.ipc.trace.TestFileSpanStorage): expected:<9> but was:<10>
          in virtual environment this test passes... no bloker, just strange.
        • generated thrift files are checked-in, so it requires manual updating of those, while in pom.xml there is a profile definition "thrift-generate", but it is not updated when generated.
          So I have a question - why maven-thrift-plugin is not used here? My suggestion is that this was done years ago and back then this plugin may not have been around. Also, should I commit the generated thrift-objects back?
        Show
        Michael Pershyn added a comment - Intially I was looking forward to make this test in avro proj, but haven't found where to put it. ThriftData looks a good place, so I'm on it, but have some questions: There is environment dependency - current test is compiled with thrift v0.7 dependency, so I had to fire up a vagrant environment that has all the necessary tools for thrift (provisioning to install thrift 0.7, java, maven). In my dev environment I have thrift 0.9.1, so I found virtual env really useful way to avoid versions clash, but have doubts even to suggest to contribute that, because it is specific version of linux only... So I need some advice here On my dev environment test fail on clear checkout from trunk, when compiling Apache Avro IPC: Failed tests: testBasicMaxSpans(org.apache.avro.ipc.trace.TestFileSpanStorage): expected:<9> but was:<10> in virtual environment this test passes... no bloker, just strange. generated thrift files are checked-in, so it requires manual updating of those, while in pom.xml there is a profile definition "thrift-generate", but it is not updated when generated. So I have a question - why maven-thrift-plugin is not used here? My suggestion is that this was done years ago and back then this plugin may not have been around. Also, should I commit the generated thrift-objects back?
        Hide
        Michael Pershyn added a comment -

        Unit test for AVRO-1500, also with generated thrift classes. Used version 0.7 to generate them (according to comments, same version was used to generate previous version).

        Show
        Michael Pershyn added a comment - Unit test for AVRO-1500 , also with generated thrift classes. Used version 0.7 to generate them (according to comments, same version was used to generate previous version).
        Hide
        Michael Pershyn added a comment -

        Might we just override getSchemaName() in ThriftData to test for short?

        This is also a solution... but, imho, the problem is in avro union resolution.
        Namely if I provide Short to INT field, it would be converted automatically, but if Short is provided to INT field in union, then it fails. This behaviour seems incosistent to me.
        Also, potentially, it may lead to errors in other conversion utils.

        Show
        Michael Pershyn added a comment - Might we just override getSchemaName() in ThriftData to test for short? This is also a solution... but, imho, the problem is in avro union resolution. Namely if I provide Short to INT field, it would be converted automatically, but if Short is provided to INT field in union, then it fails. This behaviour seems incosistent to me. Also, potentially, it may lead to errors in other conversion utils.
        Hide
        Doug Cutting added a comment -

        > if Short is provided to INT field in union, then it fails. This behaviour seems inconsistent to me.

        This change doesn't make it consistent though. Shorts would be written as ints, even in unions, but wouldn't be read back in as shorts. Moreover, there may be other subclasses of GenericData that rely on the current behavior. So the safest change is to fix ThriftData, which does already read and write shorts correctly, it just doesn't handle them in unions.

        The Thrift-generated files are checked-in so that every developer who runs tests need not have the Thrift compiler installed. (The same is done with Protobuf.) Feel free to propose an improved mechanism.

        Show
        Doug Cutting added a comment - > if Short is provided to INT field in union, then it fails. This behaviour seems inconsistent to me. This change doesn't make it consistent though. Shorts would be written as ints, even in unions, but wouldn't be read back in as shorts. Moreover, there may be other subclasses of GenericData that rely on the current behavior. So the safest change is to fix ThriftData, which does already read and write shorts correctly, it just doesn't handle them in unions. The Thrift-generated files are checked-in so that every developer who runs tests need not have the Thrift compiler installed. (The same is done with Protobuf.) Feel free to propose an improved mechanism.
        Hide
        Michael Pershyn added a comment -

        Agreed on arguments.
        Implemented the fix in ThriftData through @Override getSchemaName.

        Show
        Michael Pershyn added a comment - Agreed on arguments. Implemented the fix in ThriftData through @Override getSchemaName.
        Hide
        Michael Pershyn added a comment -

        The Thrift-generated files are checked-in so that every developer who runs tests need not have the Thrift compiler installed.

        This sounds reasonable.

        My initial though was to submit a Vagrant file with couple provisioning scripts to get thrift up and running on virtual machine. This would make regeneration of thrift with specific version of compiler (e.g. 0.7) easier. But, this will at the same time limit it to one platform (scripts are for linux only). Also, this only would be helpful when someone wants to recompile thrift, which is I believe rare occasion.

        So I have serious doubts on vagrant file, and now my proposal is just to add README file to the thrift projects, where to write, that

        • The Thrift-generated files are checked-in so that every developer who runs tests need not have the Thrift compiler installed.
        • For regeneration of thrift files make sure you have required version of thrift-compiler installed (0.7) and run mvn -Pthrift-generate generate-test-sources
        Show
        Michael Pershyn added a comment - The Thrift-generated files are checked-in so that every developer who runs tests need not have the Thrift compiler installed. This sounds reasonable. My initial though was to submit a Vagrant file with couple provisioning scripts to get thrift up and running on virtual machine. This would make regeneration of thrift with specific version of compiler (e.g. 0.7) easier. But, this will at the same time limit it to one platform (scripts are for linux only). Also, this only would be helpful when someone wants to recompile thrift, which is I believe rare occasion. So I have serious doubts on vagrant file, and now my proposal is just to add README file to the thrift projects, where to write, that The Thrift-generated files are checked-in so that every developer who runs tests need not have the Thrift compiler installed. For regeneration of thrift files make sure you have required version of thrift-compiler installed (0.7) and run mvn -Pthrift-generate generate-test-sources
        Hide
        ASF subversion and git services added a comment -

        Commit 1601020 from Doug Cutting in branch 'avro/trunk'
        [ https://svn.apache.org/r1601020 ]

        AVRO-1500. Java: Fix bug in handling of Thrift shorts in unions. Contributed by Michael Pershyn.

        Show
        ASF subversion and git services added a comment - Commit 1601020 from Doug Cutting in branch 'avro/trunk' [ https://svn.apache.org/r1601020 ] AVRO-1500 . Java: Fix bug in handling of Thrift shorts in unions. Contributed by Michael Pershyn.
        Hide
        Doug Cutting added a comment -

        I committed this. Thanks, Michael!

        Show
        Doug Cutting added a comment - I committed this. Thanks, Michael!
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in AvroJava #455 (See https://builds.apache.org/job/AvroJava/455/)
        AVRO-1500. Java: Fix bug in handling of Thrift shorts in unions. Contributed by Michael Pershyn. (cutting: rev 1601020)

        • /avro/trunk/CHANGES.txt
        • /avro/trunk/lang/java/thrift/README
        • /avro/trunk/lang/java/thrift/src/main/java/org/apache/avro/thrift/ThriftData.java
        • /avro/trunk/lang/java/thrift/src/test/java/org/apache/avro/thrift/TestThrift.java
        • /avro/trunk/lang/java/thrift/src/test/java/org/apache/avro/thrift/test/Test.java
        • /avro/trunk/lang/java/thrift/src/test/thrift/test.thrift
        Show
        Hudson added a comment - SUCCESS: Integrated in AvroJava #455 (See https://builds.apache.org/job/AvroJava/455/ ) AVRO-1500 . Java: Fix bug in handling of Thrift shorts in unions. Contributed by Michael Pershyn. (cutting: rev 1601020) /avro/trunk/CHANGES.txt /avro/trunk/lang/java/thrift/README /avro/trunk/lang/java/thrift/src/main/java/org/apache/avro/thrift/ThriftData.java /avro/trunk/lang/java/thrift/src/test/java/org/apache/avro/thrift/TestThrift.java /avro/trunk/lang/java/thrift/src/test/java/org/apache/avro/thrift/test/Test.java /avro/trunk/lang/java/thrift/src/test/thrift/test.thrift

          People

          • Assignee:
            Michael Pershyn
            Reporter:
            Michael Pershyn
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development