Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-2046

avro-python3: Very restricted set of data types which are allowed in AvroSchemaFromJSONData

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.8.2
    • None
    • python
    • None
    • avro-python3 (1.8.2)

    • Important

    Description

      Hey, I come from CWL project: https://github.com/common-workflow-language/cwltool and as a part of my GSoC project, I'm working on adding Python 3 compatibility to cwltool codebase. We've been using avro-python2 for a long time now and it has worked great for us in our projects: schema_salad and cwltool.

      In the process of porting cwltool, I'm facing issues with avro-python3 library. I've found the following bug:

      Minimal reproducible example:

      from collections import OrderedDict
      import avro.schema
      AvroSchemaFromJSONData = avro.schema.SchemaFromJSONData
      
      a = {
        "fields": [
          {
            "name": "name",
            "type": "string"
          },
          {
            "name": "favorite_number",
            "type": [
              "int",
              "null"
            ]
          },
          {
            "name": "favorite_color",
            "type": [
              "string",
              "null"
            ]
          }
        ],
        "name": "User",
        "namespace": "example.avro",
        "type": "record"
      }
      
      b = OrderedDict(a)
      
      AvroSchemaFromJSONData(a)
      AvroSchemaFromJSONData(b)
      
      

      Ouput:

      ~/Desktop/test/venv3/lib/python3.5/site-packages/avro/schema.py in SchemaFromJSONData(json_data, names)
         1252   if parser is None:
         1253     raise SchemaParseException(
      -> 1254         'Invalid JSON descriptor for an Avro schema: %r.' % json_data)
         1255   return parser(json_data, names=names)
         1256 
      
      SchemaParseException: Invalid JSON descriptor for an Avro schema: OrderedDict([('namespace', 'example.avro'), ('type', 'record'), ('name', 'User'), ('fields', [{'type': 'string', 'name': 'name'}, {'type': ['int', 'null'], 'name': 'favorite_number'}, {'type': ['string', 'null'], 'name': 'favorite_color'}])]).
      
      The current implementation of this function does not allow for any dict like data type. It, however, works in avro-python2.

      Relevant line of code: https://github.com/apache/avro/blob/master/lang/py3/avro/schema.py#L1250

      Apart from this, I've tried using ``2to3`` tool on avro-python2 and testing our project with it and it works perfectly. Thus, through this issue, I also want to motivate the following PR: https://github.com/apache/avro/pull/234
      I don't expect a unified codebase for avro python2 and python3 as of now or in near future. There has been a discussion on it before: https://github.com/apache/avro/pull/133

      But having avro-python2 cross compatible for both py2 and py3 would be really helpful for our project and we will be able to complete our porting process. Thanks.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              manu-chroma Manvendra Singh
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: