Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-2221

Type promotions within union schemas cause round trip failures

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.2
    • Fix Version/s: None
    • Component/s: csharp, spec
    • Labels:
      None

      Description

      When using the C# SpecificWriter and SpecificReader with the following write and read schema any long that is written to the union fails to be round tripped and is instead read back promoted to a double.

      "type":["null","double","long"]

       

      In order to avoid type promotion issues and properly round trip we found we had to re-order the types within our union to use the following write and read schema.

      "type":["null","long","double"]

       

      We believe this behavior is due to the type promotion behavior of the PrimitiveSchema.CanRead(Schema) implementation https://github.com/apache/avro/blob/5e8168a25494b04ef0aeaf6421a033d7192f5625/lang/csharp/src/apache/main/Schema/PrimitiveSchema.cs but why this occurs isn't entirely clear based on how we've read the spec.

      Potentially relevant sections from the schema resolution rules:

      "To match, one of the following must hold:

      • both schemas are arrays whose item types match
      • both schemas are maps whose value types match
      • both schemas are enums whose names match
      • both schemas are fixed whose sizes and names match
      • both schemas are records with the same name
      • either schema is a union
      • both schemas have same primitive type
      • the writer's schema may be promoted to the reader's as follows:
        • int is promotable to long, float, or double
        • long is promotable to float or double
        • float is promotable to double
        • string is promotable to bytes
        • bytes is promotable to string
      • if both are unions:
        The first schema in the reader's union that matches the selected writer's union schema is recursively resolved against it. if none match, an error is signalled."

       

      The current implementation of PrimitiveSchema.CanRead(Schema) appears to greedily match for numeric types only.

      For reference, neither of the following union schemas appeared to have any round trip type promotion issues which may be inconsistent with the spec.

      "type":["null","bytes","string"]

      "type":["null","string","bytes"]

       

      Should there be a short circuit in the logic to avoid type promotions within union schemas when the reader and writer schemas are identical?  Or are these type promotions only for numeric types really intentional and just need to be better clarified in the spec to highlight that union schemas must order their types by precedence of type promotions in order to avoid unintended conversions when round tripped?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              nburger Nathaniel Burger
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: