Avro
  1. Avro
  2. AVRO-966

Bug in GenericData#resolveUnion when resolving union of null and array

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.1
    • Fix Version/s: 1.6.2
    • Component/s: java
    • Labels:
      None

      Description

      I have a simple avro schema from which I generate an avro specific object:
      {{
      {"type": "record",
      "name": "org.company.Test",
      "fields": [
      { "name": "arr","type": ["null",

      {"type": "array","items": "float" }

      ], "default": null }
      ]
      }
      }}
      Then a simple piece of code to reproduce a bug:
      {{
      Test test = new Test();
      List<Float> list = new ArrayList<Float>();
      list.add(1.1f);
      list.add(2.2f);
      test.setArr(list);

      DataFileWriter<Test> myWriter = new DataFileWriter<Test>(new ReflectDatumWriter(test.getSchema()));
      File f = new File("/tmp/test.avro");
      myWriter.create(test.getSchema(), f);
      myWriter.append(test);
      myWriter.close();
      }}

      I get an exception:
      {{
      Exception in thread "main" org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["null",

      {"type":"array","items":"float"}

      ]: [1.1, 2.2]
      at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261)
      <my code>
      Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null",

      {"type":"array","items":"float"}

      ]: [1.1, 2.2]
      at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:549)
      at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:137)
      at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
      at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:102)
      at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:105)
      at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
      at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:102)
      at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
      at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255)
      }}

      My investigation showed that in GenericData#resolveUnion method getSchemaName() is called. And the latter method when checks whether datum is a record, succeeds. Why it happens boils down to the fact that in ReflectData#createSchema an "if"-body under case (type instanceof ParameterizedType) is not executed.

      I can supply more details if needed. Or explain in a clear way if I didn't manage to.

      1. AVRO-966.patch
        1 kB
        Doug Cutting
      2. AVRO-966-2.patch
        2 kB
        Vyacheslav Zholudev
      3. AVRO-966.patch
        2 kB
        Doug Cutting

        Activity

        Hide
        Himanshu Mishra added a comment -

        It fails in case of a "map" too

        Show
        Himanshu Mishra added a comment - It fails in case of a "map" too
        Hide
        Doug Cutting added a comment -

        I committed this.

        Show
        Doug Cutting added a comment - I committed this.
        Hide
        Vyacheslav Zholudev added a comment -

        Sounds convincing. Thanks for the patch!

        Show
        Vyacheslav Zholudev added a comment - Sounds convincing. Thanks for the patch!
        Hide
        Doug Cutting added a comment -

        I would rather make isRecord() correct, as it may be used elsewhere. Also, when folks have very large unions, where the performance of getSchemaName() is most critical, they usually have lots of records, since only a single instance of any unnamed type is allowed in a union, while many records are permitted, so I'd rather not move the record test down in GenericData#resolveUnion().

        Here's a version that fixes ReflectData#isRecord() for ByteBuffer by fixing getSchema(), so that performance is not affected. I also improved the fix for arrays to only check for Collection, since that's the only case that creates problems for getSchema(), and throw an exception in getSchema() if a Collection is passed to better detect such problems in the future.

        I'll commit this version, okay?

        Show
        Doug Cutting added a comment - I would rather make isRecord() correct, as it may be used elsewhere. Also, when folks have very large unions, where the performance of getSchemaName() is most critical, they usually have lots of records, since only a single instance of any unnamed type is allowed in a union, while many records are permitted, so I'd rather not move the record test down in GenericData#resolveUnion(). Here's a version that fixes ReflectData#isRecord() for ByteBuffer by fixing getSchema(), so that performance is not affected. I also improved the fix for arrays to only check for Collection, since that's the only case that creates problems for getSchema(), and throw an exception in getSchema() if a Collection is passed to better detect such problems in the future. I'll commit this version, okay?
        Hide
        Vyacheslav Zholudev added a comment -

        It still fails for bytes. I attached a patch with an added unit test.
        I propose to change an order of checking in GenericData.getSchemaName(...). Thus isArray() and isBytes() will be executed only once unlike the case when a check would be done in isRecord(). On the downside, isRecord() method would return incorrect values for some schemata. In this case it would be good to create a unit test that breaks.

        Show
        Vyacheslav Zholudev added a comment - It still fails for bytes. I attached a patch with an added unit test. I propose to change an order of checking in GenericData.getSchemaName(...). Thus isArray() and isBytes() will be executed only once unlike the case when a check would be done in isRecord(). On the downside, isRecord() method would return incorrect values for some schemata. In this case it would be good to create a unit test that breaks.
        Hide
        Doug Cutting added a comment -

        Unless there are objections I will commit this soon.

        Show
        Doug Cutting added a comment - Unless there are objections I will commit this soon.
        Hide
        Doug Cutting added a comment -

        Here's a patch that should fix this.

        Show
        Doug Cutting added a comment - Here's a patch that should fix this.

          People

          • Assignee:
            Doug Cutting
            Reporter:
            Vyacheslav Zholudev
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development