Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1562

Add support for types extending Maps/Collections

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 1.7.6
    • None
    • java
    • None

    Description

      Consider the following code:

      import java.io.ByteArrayOutputStream;
      import java.util.*;
      
      import org.apache.avro.Schema;
      import org.apache.avro.file.DataFileWriter;
      import org.apache.avro.reflect.ReflectData;
      import org.apache.avro.reflect.ReflectDatumWriter;
      
      public class AvroDerivingMaps
      {
          public static void main (String [] args) throws Exception
          {
              MapDerivedContainer orig = new MapDerivedContainer();
              ReflectData rdata = ReflectData.AllowNull.get();
              Schema schema = rdata.getSchema(MapDerivedContainer.class);
              System.out.println(schema);
              
              ReflectDatumWriter<MapDerivedContainer> datumWriter = new ReflectDatumWriter (MapDerivedContainer.class, rdata);
              DataFileWriter<MapDerivedContainer> fileWriter = new DataFileWriter<MapDerivedContainer> (datumWriter);
              ByteArrayOutputStream baos = new ByteArrayOutputStream();
              fileWriter.create(schema, baos);
              fileWriter.append(orig);
              fileWriter.close();
          }
      }
      
      class MapDerived extends HashMap<String, Integer>
      {
          Integer a = 1;
          String b = "b";
      }
      
      class MapDerivedContainer
      {
          MapDerived2 map = new MapDerived2();
      }
      
      class MapDerived2 extends MapDerived
      {
          String c = "c";
      }
      



      It throws the following exception:

      {"type":"record","name":"MapDerivedContainer","namespace":"avro","fields":[{"name":"map","type":["null",{"type":"record","name":"MapDerived2","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}],"default":null}]}
      


      Exception in thread "main" org.apache.avro.file.DataFileWriter$AppendWriteException:
      org.apache.avro.UnresolvedUnionException:
      Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"record","name":"MapDerived2","namespace":"avro","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}]: {}
      at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:600)
      at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
      at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
      at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
      at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
      at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:203)
      at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
      at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
      at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
      at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
      at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
      ... 1 more



      It appears that ReflectData#createSchema() checks for "type instanceof ParameterizedType" and because of this, it skips handling of the map.
      The same is not true of GenericData#isMap() and GenericData#resolveUnion() fails because of this.

      The same may be true for classes extending ArrayList, Collection, Set etc.
      Also, note the schema for the class extending Map:

      {  
         "type":"record",
         "name":"MapDerived2",
         "fields":[  
            {  
               "name":"c",
               "type":[  
                  "null",
                  "string"
               ],
               "default":null
            },
            {  
               "name":"a",
               "type":[  
                  "null",
                  "int"
               ],
               "default":null
            },
            {  
               "name":"b",
               "type":[  
                  "null",
                  "string"
               ],
               "default":null
            }
         ]
      }
      

      This schema ignores the Map completely.
      Probably, for such a class, the schema should look like:

      {
         "type":"record",
         "name":"MapDerived2",
         "fields":[  
            {  
               "name":"c",
               "type":[  
                  "null",
                  "string"
               ],
               "default":null
            },
            .... // Other fields in the class extending the Map
           {
              "name":"BASE_MAP",
               "type":[
                  "null",
                  "map" ... // Normal map which the class extends (implements?)
               ],
               "default":null
           }
      }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            sachingoyal Sachin Goyal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: