Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
We've run across this in some code that interoperates between Java and Python.
The spec currently forbids using a primitive type name as a keyword: Primitive type names have no namespace and their names may not be defined in any namespace.
{"type":"record","name":"long","fields":[{"name":"a1","type":"long"}]}
That fails in Java with "org.apache.avro.AvroTypeException: Schemas may not be named after primitives: long"
What do we expect to happen when a named schema uses a complex type?
{"type":"record","name":"record","fields":[{"name":"a1","type":"long"}]}
This currently succeeds in Java and the schema can be used to serialize and deserialize data.
This currently fails in Python with: avro.schema.SchemaParseException: record is a reserved type name
Which one is the correct behaviour?
This gets a bit more complicated when we consider using the name as a reference.
The following two schemas both work in Java:
{"type":"record","name":"LinkedList", "fields":[ {"name":"value","type":"int}, {"name":"next","type":["null","LinkedList"]}]}"
{"type":"record","name":"LinkedList", "fields":[ {"name":"value","type":"int}, {"name":"next","type":["null",{"type":"LinkedList"}]}]}"
If we rename LinkedList to record the former succeeds in Java and the latter fails with org.apache.avro.SchemaParseException: No name in schema: {"type":"record"}
Edit: The consensus on the mailing list is the "permissive" behaviour of Java should be adopted, in order to align the SDKs. The specification doesn't currently forbid these, and this should be clarified explicitly. We should probably say that it's a best practice to avoid doing this, especially in the null namespace, since it can be confusing to a reader and potentially cause ambiguities when JSON encoding data.