Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1721

Should LogicalTypes introduce schema (in)compatibility and canonical parsing form changes?

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.0
    • Fix Version/s: None
    • Component/s: spec
    • Labels:
      None

      Description

      During a recent spike of integrating LogcialTypes into our Avro
      wrapper we encountered the the following questions.

      1. Is the addition/removal of a logical to a schema element a backward
      breaking change?
      2. Should the canonical parsing form include logical type information?

      I understand that the underlying base Avro types are not changing with
      the introduction of LogicalTypes. The raw serialized data will be the
      same. However the client code dependent on the deserialization may be
      subject to breakage.

      Let me elaborate on these.

      1. Is the addition/removal of a logical to a schema element a backward
      breaking change?

      Take for example the UUID logical type. At least in the case of
      GenericData, if I change a schema element from a string to a UUID and
      I have Converters turned on, existing client code that is expecting a
      String to be returned will now have a runtime exception when an
      instance of UUID is suddenly returned.

      From the client's perspective I've just change the underlying type of
      the element.

      2. Should the canonical parsing form (CPF) include logical type information?

      If the answer to #1 is yes, then the CPF should also include the
      logical type information.

      We were wondering if there might be a slightly less strict form of
      schema "normalization" and fingerprinting. Currently the
      fingerprinting process is based on the CPF. It would be interesting to
      introduce the "Normal Parsing Form" (NPF) which retains all the
      optional information contained within a schema, but in a normal or
      regular way. That way a fingerprint could be determined without having
      to script possibly important information, like the LogicalType info.

      Interested in your thoughts on these questions.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                bob.cotton@gmail.com Bob Cotton
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: