Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-2046

avro-python3: Very restricted set of data types which are allowed in AvroSchemaFromJSONData

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.8.2
    • Fix Version/s: None
    • Component/s: python
    • Labels:
      None
    • Environment:

      avro-python3 (1.8.2)

    • Flags:
      Important

      Description

      Hey, I come from CWL project: https://github.com/common-workflow-language/cwltool and as a part of my GSoC project, I'm working on adding Python 3 compatibility to cwltool codebase. We've been using avro-python2 for a long time now and it has worked great for us in our projects: schema_salad and cwltool.

      In the process of porting cwltool, I'm facing issues with avro-python3 library. I've found the following bug:

      Minimal reproducible example:

      from collections import OrderedDict
      import avro.schema
      AvroSchemaFromJSONData = avro.schema.SchemaFromJSONData
      
      a = {
        "fields": [
          {
            "name": "name",
            "type": "string"
          },
          {
            "name": "favorite_number",
            "type": [
              "int",
              "null"
            ]
          },
          {
            "name": "favorite_color",
            "type": [
              "string",
              "null"
            ]
          }
        ],
        "name": "User",
        "namespace": "example.avro",
        "type": "record"
      }
      
      b = OrderedDict(a)
      
      AvroSchemaFromJSONData(a)
      AvroSchemaFromJSONData(b)
      
      

      Ouput:

      ~/Desktop/test/venv3/lib/python3.5/site-packages/avro/schema.py in SchemaFromJSONData(json_data, names)
         1252   if parser is None:
         1253     raise SchemaParseException(
      -> 1254         'Invalid JSON descriptor for an Avro schema: %r.' % json_data)
         1255   return parser(json_data, names=names)
         1256 
      
      SchemaParseException: Invalid JSON descriptor for an Avro schema: OrderedDict([('namespace', 'example.avro'), ('type', 'record'), ('name', 'User'), ('fields', [{'type': 'string', 'name': 'name'}, {'type': ['int', 'null'], 'name': 'favorite_number'}, {'type': ['string', 'null'], 'name': 'favorite_color'}])]).
      
      The current implementation of this function does not allow for any dict like data type. It, however, works in avro-python2.

      Relevant line of code: https://github.com/apache/avro/blob/master/lang/py3/avro/schema.py#L1250

      Apart from this, I've tried using ``2to3`` tool on avro-python2 and testing our project with it and it works perfectly. Thus, through this issue, I also want to motivate the following PR: https://github.com/apache/avro/pull/234
      I don't expect a unified codebase for avro python2 and python3 as of now or in near future. There has been a discussion on it before: https://github.com/apache/avro/pull/133

      But having avro-python2 cross compatible for both py2 and py3 would be really helpful for our project and we will be able to complete our porting process. Thanks.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                manu-chroma Manvendra Singh
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: