Avro
  1. Avro
  2. AVRO-1500

Unknown datum type exception during union type resolution (no short to int conversion).

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.4, 1.7.5, 1.7.6
    • Fix Version/s: 1.7.7
    • Component/s: None
    • Labels:
    • Environment:

      java API

      Description

      There is a conversion for values of type short (and other numeric types) if in the schema they are declared as int:

      /lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumWriter.java
       protected void write(Schema schema, Object datum, Encoder out)
          throws IOException {
          try {
            switch (schema.getType()) {
              // ...
              case INT:     out.writeInt(((Number)datum).intValue()); break;
              // ...
      

      So, if a value of short type is passed to INT field, it will be converted and saved as INT in avro.

      But, when there is next field in schema:

       ["null",{"type":"int","thrift":"short"}] 

      which is a union with int in it, and short is passed in (lets say 5), then we are having the next exception:

      org.apache.avro.AvroRuntimeException: Unknown datum type: 5
      	at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:593)
      	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:558)
      	at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:144) 
      

      This happens because in org.apache.avro.generic.GenericData there is no check if the passed object has a type of java.lang.Short, and it is not get converted then in write method of GenericDatumWriter:

      lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
      /** Return the schema full name for a datum.  Called by {@link
         * #resolveUnion(Schema,Object)}. */
      protected String getSchemaName(Object datum) {
      /* ... */
      if (isInteger(datum))
        return Type.INT.getName();
      if (isLong(datum))
        return Type.LONG.getName();
       if (isFloat(datum))
       return Type.FLOAT.getName();
      if (isDouble(datum))
        return Type.DOUBLE.getName();
       if (isBoolean(datum))
        return Type.BOOLEAN.getName();
      throw new AvroRuntimeException("Unknown datum type: "+datum);
      

      This error initially occured during thrift to avro conversion, when thrift obj had optional field of type i16.
      In thrift to avro schema converter, if the type is short in thrift (i16) it will be implicitly converted to int in avro-schema, so the values should be converted as well. This is already done if they are not in the union (not optional). Otherwise the exception is thrown.
      The snippet from schema conversion code is below:

      lang/java/thrift/src/main/java/org/apache/avro/thrift/ThriftData.java
      private Schema getSchema(FieldValueMetaData f) {
        switch (f.type) {
        /* ... */ 
          case TType.I16:
            Schema s = Schema.create(Schema.Type.INT);
            s.addProp(THRIFT_PROP, "short");
          return s;
         /* ... */
      

      Proposal is to add isShort check to generic data, as well as isShort method implementation:

      lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
      
      protected String getSchemaName(Object datum) {
      // ..
      if (isInteger(datum) || isShort(datum))
         return Type.INT.getName();
      // ..
      

      or maybe even some kind of isNumeric method, so the behaviour will be same for INT fields and INT fields that are in Union.

      1. AVRO-1500.patch
        1 kB
        Michael Pershyn
      2. AVRO-1500-README.patch
        0.4 kB
        Michael Pershyn
      3. AVRO-1500-ThriftData.patch
        0.8 kB
        Michael Pershyn
      4. AVRO-1500-unit-test.diff
        11 kB
        Michael Pershyn

        Activity

        Michael Pershyn created issue -
        Michael Pershyn made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Labels easyfix newbie patch
        Michael Pershyn made changes -
        Attachment AVRO-1500.patch [ 12646921 ]
        Michael Pershyn made changes -
        Attachment AVRO-1500-unit-test.diff [ 12647141 ]
        Michael Pershyn made changes -
        Attachment AVRO-1500-ThriftData.patch [ 12647570 ]
        Michael Pershyn made changes -
        Attachment AVRO-1500-README.patch [ 12647577 ]
        Doug Cutting made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Assignee Michael Pershyn [ pershyn ]
        Fix Version/s 1.7.7 [ 12326041 ]
        Resolution Fixed [ 1 ]
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Michael Pershyn
            Reporter:
            Michael Pershyn
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development