Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-3370

[Spec] Inconsistent behaviour on types as invalid names.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.11.1
    • None

    Description

      We've run across this in some code that interoperates between Java and Python.

      The spec currently forbids using a primitive type name as a keyword: Primitive type names have no namespace and their names may not be defined in any namespace.

      {"type":"record","name":"long","fields":[{"name":"a1","type":"long"}]} 

      That fails in Java with "org.apache.avro.AvroTypeException: Schemas may not be named after primitives: long"

      What do we expect to happen when a named schema uses a complex type?

      {"type":"record","name":"record","fields":[{"name":"a1","type":"long"}]} 

      This currently succeeds in Java and the schema can be used to serialize and deserialize data.

      This currently fails in Python with: avro.schema.SchemaParseException: record is a reserved type name

      Which one is the correct behaviour?

      This gets a bit more complicated when we consider using the name as a reference.

      The following two schemas both work in Java:

      {"type":"record","name":"LinkedList",
      "fields":[
        {"name":"value","type":"int},
        {"name":"next","type":["null","LinkedList"]}]}"  
      {"type":"record","name":"LinkedList",
      "fields":[
        {"name":"value","type":"int},
        {"name":"next","type":["null",{"type":"LinkedList"}]}]}"  
      

      If we rename LinkedList to record the former succeeds in Java and the latter fails with org.apache.avro.SchemaParseException: No name in schema: {"type":"record"}

      Edit: The consensus on the mailing list is the "permissive" behaviour of Java should be adopted, in order to align the SDKs.  The specification doesn't currently forbid these, and this should be clarified explicitly.  We should probably say that it's a best practice to avoid doing this, especially in the null namespace, since it can be confusing to a reader and potentially cause ambiguities when JSON encoding data.

      Attachments

        Activity

          People

            rskraba Ryan Skraba
            rskraba Ryan Skraba
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2.5h
                2.5h