Avro
  1. Avro
  2. AVRO-1021

Fix a few name-related imperfections in Avro spec

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.2
    • Component/s: spec
    • Labels:
      None

      Description

      Require names are defined before used; disallow multiple definitions of names; clarify that name-equality is case sensitive (for type names, field names, and enum symbols).

      1. AVRO-1021.patch
        1 kB
        Raymie Stata
      2. AVRO-1021.patch
        1 kB
        Raymie Stata
      3. AVRO-1021.patch
        1 kB
        Raymie Stata

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          I committed this. Thanks, Raymie!

          Show
          Doug Cutting added a comment - I committed this. Thanks, Raymie!
          Hide
          Doug Cutting added a comment -

          > where the types attribute of a protocol is always deemed to come "before" the messages attribute

          Works for me.

          Show
          Doug Cutting added a comment - > where the types attribute of a protocol is always deemed to come "before" the messages attribute Works for me.
          Hide
          Raymie Stata added a comment -

          The types array in a protocol definition could come textually after the messages, but the types must be processed before the messages and in-order. Should we clarify that too?

          Good catch. I looked at Schemas and verified that types appear in arrays, but not protocols. What if I changed my older text to the following: "A schema or protocol may not contain multiple definitions of a fullname. Further, a name must be defined before it is used ("before" in the depth-first, left-to-right traversal of the JSON parse tree, where the types attribute of a protocol is always deemed to come "before" the messages attribute.)" A bit windy, but precise.

          It would sure be nice if there were a few shared folders that every language tested schemas and protocols against.

          The test-suite I'm writing for AVRO-1006 will include a file of test cases in src/test/resources that could be the basis of what you're talking about here. I'll get that posted soon, you can look at it and see what more would need to be done.

          Show
          Raymie Stata added a comment - The types array in a protocol definition could come textually after the messages, but the types must be processed before the messages and in-order. Should we clarify that too? Good catch. I looked at Schemas and verified that types appear in arrays, but not protocols. What if I changed my older text to the following: "A schema or protocol may not contain multiple definitions of a fullname. Further, a name must be defined before it is used ("before" in the depth-first, left-to-right traversal of the JSON parse tree, where the types attribute of a protocol is always deemed to come "before" the messages attribute.)" A bit windy, but precise. It would sure be nice if there were a few shared folders that every language tested schemas and protocols against. The test-suite I'm writing for AVRO-1006 will include a file of test cases in src/test/resources that could be the basis of what you're talking about here. I'll get that posted soon, you can look at it and see what more would need to be done.
          Hide
          Scott Carey added a comment -

          I don't think any implementation currently supports use-before-define, does it?

          It would sure be nice if there were a few shared folders that every language tested schemas and protocols against. For example, if there was a 'valid schemas' and 'invalid schemas' folder that all languages were expected to pass / fail against, then someone could test every language by adding to this folder without having to have much knowledge about any language at all. Then we could just add a use-before-define schema to the invalid schemas folder, and find out if any languages support it.

          Show
          Scott Carey added a comment - I don't think any implementation currently supports use-before-define, does it? It would sure be nice if there were a few shared folders that every language tested schemas and protocols against. For example, if there was a 'valid schemas' and 'invalid schemas' folder that all languages were expected to pass / fail against, then someone could test every language by adding to this folder without having to have much knowledge about any language at all. Then we could just add a use-before-define schema to the invalid schemas folder, and find out if any languages support it.
          Hide
          Doug Cutting added a comment -

          > if there is an Avro Data File or old schema that used a name before defining it that currently works

          I don't think any implementation currently supports use-before-define, does it?

          A "left-to-right traversal" of JSON only makes sense for array elements. The only schemas that include multiple types and traversal order matters are unions and records, but these use JSON arrays, so left-to-right works. The types array in a protocol definition could come textually after the messages, but the types must be processed before the messages and in-order. Should we clarify that too?

          Show
          Doug Cutting added a comment - > if there is an Avro Data File or old schema that used a name before defining it that currently works I don't think any implementation currently supports use-before-define, does it? A "left-to-right traversal" of JSON only makes sense for array elements. The only schemas that include multiple types and traversal order matters are unions and records, but these use JSON arrays, so left-to-right works. The types array in a protocol definition could come textually after the messages, but the types must be processed before the messages and in-order. Should we clarify that too?
          Hide
          Raymie Stata added a comment -

          Fixed typo

          Show
          Raymie Stata added a comment - Fixed typo
          Hide
          Scott Carey added a comment -
          • There is a minor typo: "wdepth-first"
          • Implementations that previously allowed for the previously lax restrictions may need to continue to support that for some use cases. For example, if there is an Avro Data File or old schema that used a name before defining it that currently works, there must be a way for future versions to continue to work.

          If we decide to accept this change, we should first provide an integration test for all languages that checks conformance, for those languages that previously supported 'declare before use' that there is still a mechanism that can parse such schemas.

          Show
          Scott Carey added a comment - There is a minor typo: "wdepth-first" Implementations that previously allowed for the previously lax restrictions may need to continue to support that for some use cases. For example, if there is an Avro Data File or old schema that used a name before defining it that currently works, there must be a way for future versions to continue to work. If we decide to accept this change, we should first provide an integration test for all languages that checks conformance, for those languages that previously supported 'declare before use' that there is still a mechanism that can parse such schemas.
          Hide
          Raymie Stata added a comment -

          Changes to spec.

          Show
          Raymie Stata added a comment - Changes to spec.

            People

            • Assignee:
              Raymie Stata
              Reporter:
              Raymie Stata
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development