Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1721

Should LogicalTypes introduce schema (in)compatibility and canonical parsing form changes?

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.8.0
    • None
    • spec
    • None

    Description

      During a recent spike of integrating LogcialTypes into our Avro
      wrapper we encountered the the following questions.

      1. Is the addition/removal of a logical to a schema element a backward
      breaking change?
      2. Should the canonical parsing form include logical type information?

      I understand that the underlying base Avro types are not changing with
      the introduction of LogicalTypes. The raw serialized data will be the
      same. However the client code dependent on the deserialization may be
      subject to breakage.

      Let me elaborate on these.

      1. Is the addition/removal of a logical to a schema element a backward
      breaking change?

      Take for example the UUID logical type. At least in the case of
      GenericData, if I change a schema element from a string to a UUID and
      I have Converters turned on, existing client code that is expecting a
      String to be returned will now have a runtime exception when an
      instance of UUID is suddenly returned.

      From the client's perspective I've just change the underlying type of
      the element.

      2. Should the canonical parsing form (CPF) include logical type information?

      If the answer to #1 is yes, then the CPF should also include the
      logical type information.

      We were wondering if there might be a slightly less strict form of
      schema "normalization" and fingerprinting. Currently the
      fingerprinting process is based on the CPF. It would be interesting to
      introduce the "Normal Parsing Form" (NPF) which retains all the
      optional information contained within a schema, but in a normal or
      regular way. That way a fingerprint could be determined without having
      to script possibly important information, like the LogicalType info.

      Interested in your thoughts on these questions.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            bob.cotton@gmail.com Bob Cotton

            Dates

              Created:
              Updated:

              Slack

                Issue deployment