Avro
  1. Avro
  2. AVRO-358

Specify "levels" of Avro implementation in the spec

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: spec
    • Labels:
      None

      Description

      We've discussed on IRC having well-defined "levels" of implementation for the Avro spec, so that we can track the maturity of an implementation in each language. We should get to work on specifying these levels more precisely and writing them into the specification.

        Issue Links

          Activity

          Hide
          Jeff Hammerbacher added a comment - - edited

          Some basics:

          0) Ability to parse .avsc files
          1) Serialization of primitive types
          2) Serialization of primitive and complex types
          3) Serialization of the container file format
          4) Ability to parse .avpr files
          5) RPC client
          6) RPC server

          Show
          Jeff Hammerbacher added a comment - - edited Some basics: 0) Ability to parse .avsc files 1) Serialization of primitive types 2) Serialization of primitive and complex types 3) Serialization of the container file format 4) Ability to parse .avpr files 5) RPC client 6) RPC server
          Hide
          Doug Cutting added a comment -

          Some thoughts:

          • Seems to me that 0-2 together form a base level. Anything less does not seem useful.
          • 3 (data files) and 4-6 (rpc) are independent. An implementation might reasonably implement 4-6 but not 3. Do we mean to prohibit such implementations?
          • 4 isn't really useful on its own.
          • you don't mention json-format

          So I might instead opt to list the following independent features that an implementation might support:

          • read/write binary-format
          • read/write json-format
          • read/write data files
          • rpc client
          • rpc server

          We could then suggest that implementations implement data files and rpc clients first. They'll need to implement binary-format to do this. The json-format should generally be the last thing to implement. But all that we should require is that, if an implementation claims to support a feature, that it conform to the spec when doing so.

          Show
          Doug Cutting added a comment - Some thoughts: Seems to me that 0-2 together form a base level. Anything less does not seem useful. 3 (data files) and 4-6 (rpc) are independent. An implementation might reasonably implement 4-6 but not 3. Do we mean to prohibit such implementations? 4 isn't really useful on its own. you don't mention json-format So I might instead opt to list the following independent features that an implementation might support: read/write binary-format read/write json-format read/write data files rpc client rpc server We could then suggest that implementations implement data files and rpc clients first. They'll need to implement binary-format to do this. The json-format should generally be the last thing to implement. But all that we should require is that, if an implementation claims to support a feature, that it conform to the spec when doing so.
          Hide
          Thiruvalluvan M. G. added a comment -

          Another level is ability to do schema resolution (where reader's and writer's schemas are not identical). I'm not sure what should be the exact level as it is orthogonal to json, data file and rpc.

          Show
          Thiruvalluvan M. G. added a comment - Another level is ability to do schema resolution (where reader's and writer's schemas are not identical). I'm not sure what should be the exact level as it is orthogonal to json, data file and rpc.
          Hide
          Doug Cutting added a comment -

          Do we want schema resolution to be optional at all? It's currently implemented by all implementations, I think. Without it we give up schema evolution, a major feature of Avro.

          Show
          Doug Cutting added a comment - Do we want schema resolution to be optional at all? It's currently implemented by all implementations, I think. Without it we give up schema evolution, a major feature of Avro.
          Hide
          Thiruvalluvan M. G. added a comment -

          I agree, schema resolution is a big feature. But implementations take time to get it working fully. It appears, for example, the C implementation of schema resolution is partial. I don't see code to handle default values for fields that the writer does not provide. I don't think resolutions involving unions are handled well. I hope I'm wrong.

          Show
          Thiruvalluvan M. G. added a comment - I agree, schema resolution is a big feature. But implementations take time to get it working fully. It appears, for example, the C implementation of schema resolution is partial. I don't see code to handle default values for fields that the writer does not provide. I don't think resolutions involving unions are handled well. I hope I'm wrong.
          Hide
          Bruce Mitchener added a comment -

          Instead of trying to come up with "levels" of the implementation based on various features, I think we should unify the desires driving this with the discussion of having "Avro Enhancement Proposals" or AEPs, based on the PEP process from Python.

          At that point, each AEP can be updated with information about the support for that AEP in each of the implementations.

          Things can also be moved out of the spec itself and into separate AEPs (or have discussion-style AEPs which provide additional information useful for the implementor).

          I started drafting an email about this for the dev list, I will try to finish that up this week and get it out for some further discussion.

          Show
          Bruce Mitchener added a comment - Instead of trying to come up with "levels" of the implementation based on various features, I think we should unify the desires driving this with the discussion of having "Avro Enhancement Proposals" or AEPs, based on the PEP process from Python. At that point, each AEP can be updated with information about the support for that AEP in each of the implementations. Things can also be moved out of the spec itself and into separate AEPs (or have discussion-style AEPs which provide additional information useful for the implementor). I started drafting an email about this for the dev list, I will try to finish that up this week and get it out for some further discussion.
          Hide
          Doug Cutting added a comment -

          > It appears, for example, the C implementation of schema resolution is partial.

          I'd prefer to refer to that as a bug than as an established implementation level.

          Show
          Doug Cutting added a comment - > It appears, for example, the C implementation of schema resolution is partial. I'd prefer to refer to that as a bug than as an established implementation level.
          Hide
          Doug Cutting added a comment -

          How about we add an "implementations" page to the documentation with a table listing features by implementations? We can link some features (e.g., code generation) to language-specific documentation. With this address this issue?

          Show
          Doug Cutting added a comment - How about we add an "implementations" page to the documentation with a table listing features by implementations? We can link some features (e.g., code generation) to language-specific documentation. With this address this issue?

            People

            • Assignee:
              Unassigned
              Reporter:
              Jeff Hammerbacher
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development