Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.9.2
-
None
-
None
Description
I have a producer that sometimes adds new fields to schema. Producer usually gets updated first and starts producing serialized records with new fields (data is sent via Kafka).
I have a consumer, that should be able to read the data from Kafka, even when produced with newer schema - new fields can be ignored until consumer gets updated.
I noticed that adding two fields, one at the top level and one in the nested records yields unexpected results.
Old schema:
{ "namespace" : "some.namespace", "name" : "MyRecord", "type" : "record", "fields" : [ {"name": "field1", "type": "long"}, { "name": "nested", "type": { "type" : "record", "name" : "nestedRecord", "fields" : [ {"name": "nestedField1", "type": "long"} ] } } ] }
New Schema:
{ "namespace" : "some.namespace", "name" : "MyRecord", "type" : "record", "fields" : [ {"name": "field1", "type": "long"}, {"name": "field2", "type": "long"}, { "name": "nested", "type": { "type" : "record", "name" : "nestedRecord", "fields" : [ {"name": "nestedField1", "type": "long"}, {"name": "nestedField2", "type": "long"} ] } } ] }
And example code:
Schema.Parser parser = new Schema.Parser(); InputStream fin = new FileInputStream("src/main/resources/schemas/old.json"); Schema oldSchema = parser.parse(fin);Schema.Parser parser2 = new Schema.Parser(); fin = new FileInputStream("src/main/resources/schemas/new.json"); Schema newSchema = parser2.parse(fin);GenericData.Record nested = new GenericRecordBuilder(newSchema.getField("nested").schema()) .set("nestedField1", 3L) .set("nestedField2", 4L) .build(); GenericData.Record newRecord = new GenericRecordBuilder(newSchema) .set("field1", 1L) .set("field2", 2L) .set("nested", nested) .build();GenericData gd1 = new GenericData(); RawMessageEncoder<GenericRecord> encoder = new RawMessageEncoder<>(gd1, newSchema); ByteBuffer encoded = encoder.encode(newRecord);GenericData gd2 = new GenericData(); RawMessageDecoder<GenericRecord> decoder = new RawMessageDecoder<>(gd2, oldSchema); GenericRecord record = decoder.decode(encoded);System.out.println(record.get("field1")); // prints 1 System.out.println(record.get("field2")); // prints null System.out.println(record.get("totally-fake-field")); // prints nullSystem.out.println(((GenericRecord) record.get("nested")).get("nestedField1")); // prints 2! System.out.println(((GenericRecord) record.get("nested")).get("nestedField2")); // prints null
Is this an expected behavior? Should such schema evolution be supported?