Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-2046

avro-python3: Very restricted set of data types which are allowed in AvroSchemaFromJSONData

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.2
    • Fix Version/s: None
    • Component/s: python
    • Labels:
      None
    • Environment:

      avro-python3 (1.8.2)

    • Flags:
      Important

      Description

      Hey, I come from CWL project: https://github.com/common-workflow-language/cwltool and as a part of my GSoC project, I'm working on adding Python 3 compatibility to cwltool codebase. We've been using avro-python2 for a long time now and it has worked great for us in our projects: schema_salad and cwltool.

      In the process of porting cwltool, I'm facing issues with avro-python3 library. I've found the following bug:

      Minimal reproducible example:

      from collections import OrderedDict
      import avro.schema
      AvroSchemaFromJSONData = avro.schema.SchemaFromJSONData
      
      a = {
        "fields": [
          {
            "name": "name",
            "type": "string"
          },
          {
            "name": "favorite_number",
            "type": [
              "int",
              "null"
            ]
          },
          {
            "name": "favorite_color",
            "type": [
              "string",
              "null"
            ]
          }
        ],
        "name": "User",
        "namespace": "example.avro",
        "type": "record"
      }
      
      b = OrderedDict(a)
      
      AvroSchemaFromJSONData(a)
      AvroSchemaFromJSONData(b)
      
      

      Ouput:

      ~/Desktop/test/venv3/lib/python3.5/site-packages/avro/schema.py in SchemaFromJSONData(json_data, names)
         1252   if parser is None:
         1253     raise SchemaParseException(
      -> 1254         'Invalid JSON descriptor for an Avro schema: %r.' % json_data)
         1255   return parser(json_data, names=names)
         1256 
      
      SchemaParseException: Invalid JSON descriptor for an Avro schema: OrderedDict([('namespace', 'example.avro'), ('type', 'record'), ('name', 'User'), ('fields', [{'type': 'string', 'name': 'name'}, {'type': ['int', 'null'], 'name': 'favorite_number'}, {'type': ['string', 'null'], 'name': 'favorite_color'}])]).
      
      The current implementation of this function does not allow for any dict like data type. It, however, works in avro-python2.

      Relevant line of code: https://github.com/apache/avro/blob/master/lang/py3/avro/schema.py#L1250

      Apart from this, I've tried using ``2to3`` tool on avro-python2 and testing our project with it and it works perfectly. Thus, through this issue, I also want to motivate the following PR: https://github.com/apache/avro/pull/234
      I don't expect a unified codebase for avro python2 and python3 as of now or in near future. There has been a discussion on it before: https://github.com/apache/avro/pull/133

      But having avro-python2 cross compatible for both py2 and py3 would be really helpful for our project and we will be able to complete our porting process. Thanks.

        Issue Links

          Activity

          Hide
          manu-chroma Manvendra Singh added a comment -

          Hey, as mentioned before we're really trying to wind up our Python 3 port and resolving this issue would really put things in place for us.

          I've mentioned already, that there is a pending PR to python2 implementation which doesn't make any change to the API and really goes out to make that implementation compatible with Python 3.
          I think it is useful not only for our project but for others as well who're interested in adopting a single package/API. Any communication from your side regarding this is highly appreciated!

          Show
          manu-chroma Manvendra Singh added a comment - Hey, as mentioned before we're really trying to wind up our Python 3 port and resolving this issue would really put things in place for us. I've mentioned already, that there is a pending PR to python2 implementation which doesn't make any change to the API and really goes out to make that implementation compatible with Python 3. I think it is useful not only for our project but for others as well who're interested in adopting a single package/API. Any communication from your side regarding this is highly appreciated!
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user manu-chroma opened a pull request:

          https://github.com/apache/avro/pull/235

          schema.py: No sys traceback in parse exception

          In the ``SchemaParseException``, do not provide sys traceback.

          For our project CWL Tool, we're using `avro/py` in our python 3 builds. More on this has been discussed here: https://issues.apache.org/jira/browse/AVRO-2046

          For doing this, we use `autotranslate` tool which converts `avro/py` code to python2and3 compatible code during runtime.
          The problem arises when it tries to convert this `raise Exception` command. There is no way to achieve this in a cross-compatible way without the use of external lib.

          Thus, I've created this PR. This is a very minimal change and really solves our problem for the time being. We really hope you'll consider this or at least give your feedback on the same.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/manu-chroma/avro patch-1

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/avro/pull/235.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #235


          commit 92525fda5cbae1ea7b9e5e255a52ad7e8f0ff71f
          Author: Manvendra Singh <manvendra0310@gmail.com>
          Date: 2017-07-17T08:53:28Z

          schema.py: No sys traceback in parse exception

          In the ``SchemaParseException``, do not provide sys traceback.

          For our project CWL Tool, we're using `avro/py` in our python 3 builds. More on this has been discussed here: https://issues.apache.org/jira/browse/AVRO-2046

          For doing this, we use `autotranslate` tool which converts `avro/py` code to python2and3 compatible code during runtime.
          The problem arises when it tries to convert this `raise Exception` command. There is no way to achieve this in a cross-compatible way without the use of external lib.

          Thus, I've created this PR. This is a very minimal change and really solves our problem for the time being. We really hope you'll consider this or at least give your feedback on the same.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user manu-chroma opened a pull request: https://github.com/apache/avro/pull/235 schema.py: No sys traceback in parse exception In the ``SchemaParseException``, do not provide sys traceback. For our project CWL Tool, we're using `avro/py` in our python 3 builds. More on this has been discussed here: https://issues.apache.org/jira/browse/AVRO-2046 For doing this, we use `autotranslate` tool which converts `avro/py` code to python2and3 compatible code during runtime. The problem arises when it tries to convert this `raise Exception` command. There is no way to achieve this in a cross-compatible way without the use of external lib. Thus, I've created this PR. This is a very minimal change and really solves our problem for the time being. We really hope you'll consider this or at least give your feedback on the same. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manu-chroma/avro patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/235.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #235 commit 92525fda5cbae1ea7b9e5e255a52ad7e8f0ff71f Author: Manvendra Singh <manvendra0310@gmail.com> Date: 2017-07-17T08:53:28Z schema.py: No sys traceback in parse exception In the ``SchemaParseException``, do not provide sys traceback. For our project CWL Tool, we're using `avro/py` in our python 3 builds. More on this has been discussed here: https://issues.apache.org/jira/browse/AVRO-2046 For doing this, we use `autotranslate` tool which converts `avro/py` code to python2and3 compatible code during runtime. The problem arises when it tries to convert this `raise Exception` command. There is no way to achieve this in a cross-compatible way without the use of external lib. Thus, I've created this PR. This is a very minimal change and really solves our problem for the time being. We really hope you'll consider this or at least give your feedback on the same.

            People

            • Assignee:
              Unassigned
              Reporter:
              manu-chroma Manvendra Singh
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development