Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-358

Specify "levels" of Avro implementation in the spec

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: spec
    • Labels:
      None

      Description

      We've discussed on IRC having well-defined "levels" of implementation for the Avro spec, so that we can track the maturity of an implementation in each language. We should get to work on specifying these levels more precisely and writing them into the specification.

        Issue Links

          Activity

          Hide
          hammer Jeff Hammerbacher added a comment - - edited

          Some basics:

          0) Ability to parse .avsc files
          1) Serialization of primitive types
          2) Serialization of primitive and complex types
          3) Serialization of the container file format
          4) Ability to parse .avpr files
          5) RPC client
          6) RPC server

          Show
          hammer Jeff Hammerbacher added a comment - - edited Some basics: 0) Ability to parse .avsc files 1) Serialization of primitive types 2) Serialization of primitive and complex types 3) Serialization of the container file format 4) Ability to parse .avpr files 5) RPC client 6) RPC server
          Hide
          cutting Doug Cutting added a comment -

          Some thoughts:

          • Seems to me that 0-2 together form a base level. Anything less does not seem useful.
          • 3 (data files) and 4-6 (rpc) are independent. An implementation might reasonably implement 4-6 but not 3. Do we mean to prohibit such implementations?
          • 4 isn't really useful on its own.
          • you don't mention json-format

          So I might instead opt to list the following independent features that an implementation might support:

          • read/write binary-format
          • read/write json-format
          • read/write data files
          • rpc client
          • rpc server

          We could then suggest that implementations implement data files and rpc clients first. They'll need to implement binary-format to do this. The json-format should generally be the last thing to implement. But all that we should require is that, if an implementation claims to support a feature, that it conform to the spec when doing so.

          Show
          cutting Doug Cutting added a comment - Some thoughts: Seems to me that 0-2 together form a base level. Anything less does not seem useful. 3 (data files) and 4-6 (rpc) are independent. An implementation might reasonably implement 4-6 but not 3. Do we mean to prohibit such implementations? 4 isn't really useful on its own. you don't mention json-format So I might instead opt to list the following independent features that an implementation might support: read/write binary-format read/write json-format read/write data files rpc client rpc server We could then suggest that implementations implement data files and rpc clients first. They'll need to implement binary-format to do this. The json-format should generally be the last thing to implement. But all that we should require is that, if an implementation claims to support a feature, that it conform to the spec when doing so.
          Hide
          thiru_mg Thiruvalluvan M. G. added a comment -

          Another level is ability to do schema resolution (where reader's and writer's schemas are not identical). I'm not sure what should be the exact level as it is orthogonal to json, data file and rpc.

          Show
          thiru_mg Thiruvalluvan M. G. added a comment - Another level is ability to do schema resolution (where reader's and writer's schemas are not identical). I'm not sure what should be the exact level as it is orthogonal to json, data file and rpc.
          Hide
          cutting Doug Cutting added a comment -

          Do we want schema resolution to be optional at all? It's currently implemented by all implementations, I think. Without it we give up schema evolution, a major feature of Avro.

          Show
          cutting Doug Cutting added a comment - Do we want schema resolution to be optional at all? It's currently implemented by all implementations, I think. Without it we give up schema evolution, a major feature of Avro.
          Hide
          thiru_mg Thiruvalluvan M. G. added a comment -

          I agree, schema resolution is a big feature. But implementations take time to get it working fully. It appears, for example, the C implementation of schema resolution is partial. I don't see code to handle default values for fields that the writer does not provide. I don't think resolutions involving unions are handled well. I hope I'm wrong.

          Show
          thiru_mg Thiruvalluvan M. G. added a comment - I agree, schema resolution is a big feature. But implementations take time to get it working fully. It appears, for example, the C implementation of schema resolution is partial. I don't see code to handle default values for fields that the writer does not provide. I don't think resolutions involving unions are handled well. I hope I'm wrong.
          Hide
          brucem Bruce Mitchener added a comment -

          Instead of trying to come up with "levels" of the implementation based on various features, I think we should unify the desires driving this with the discussion of having "Avro Enhancement Proposals" or AEPs, based on the PEP process from Python.

          At that point, each AEP can be updated with information about the support for that AEP in each of the implementations.

          Things can also be moved out of the spec itself and into separate AEPs (or have discussion-style AEPs which provide additional information useful for the implementor).

          I started drafting an email about this for the dev list, I will try to finish that up this week and get it out for some further discussion.

          Show
          brucem Bruce Mitchener added a comment - Instead of trying to come up with "levels" of the implementation based on various features, I think we should unify the desires driving this with the discussion of having "Avro Enhancement Proposals" or AEPs, based on the PEP process from Python. At that point, each AEP can be updated with information about the support for that AEP in each of the implementations. Things can also be moved out of the spec itself and into separate AEPs (or have discussion-style AEPs which provide additional information useful for the implementor). I started drafting an email about this for the dev list, I will try to finish that up this week and get it out for some further discussion.
          Hide
          cutting Doug Cutting added a comment -

          > It appears, for example, the C implementation of schema resolution is partial.

          I'd prefer to refer to that as a bug than as an established implementation level.

          Show
          cutting Doug Cutting added a comment - > It appears, for example, the C implementation of schema resolution is partial. I'd prefer to refer to that as a bug than as an established implementation level.
          Hide
          cutting Doug Cutting added a comment -

          How about we add an "implementations" page to the documentation with a table listing features by implementations? We can link some features (e.g., code generation) to language-specific documentation. With this address this issue?

          Show
          cutting Doug Cutting added a comment - How about we add an "implementations" page to the documentation with a table listing features by implementations? We can link some features (e.g., code generation) to language-specific documentation. With this address this issue?

            People

            • Assignee:
              Unassigned
              Reporter:
              hammer Jeff Hammerbacher
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development