Avro
  1. Avro
  2. AVRO-620

Python implementation doesn't stringify sub-schemas correctly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: python
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In [9]: import avro.schema
      
      In [10]: s = avro.schema.parse('{"type": "record", "name": "X", "fields": [{"name": "y", "type": {"type": "record", "name": "Y", "fields": [{"name": "Z", "type": "X"}]}}]}')
      
      In [11]: str(s.fields[0].type)
      Out[11]: '{"fields": [{"type": "X", "name": "Z"}], "type": "record", "name": "Y"}'
      

      str(schema) is used in avro data files to record the schema. In the case above, when we serialize the schema for Y, we should actually also serialize the schema for X, since Y needs the schema for X.

      I ran smack into this when using a schema from a protocol to write a data file, and finding that a lot of the types weren't defined when looking at the avro data file generated.

      1. AVRO-620.patch.txt
        18 kB
        Philip Zeyliger

        Activity

        Hide
        Philip Zeyliger added a comment -

        I'm working on this.

        Show
        Philip Zeyliger added a comment - I'm working on this.
        Hide
        Philip Zeyliger added a comment -

        I believe I've fixed this. I implemented a Schema.to_json(names) method, which recursively serializes schema objects to JSON-compatible structures, avoiding re-serializing schemas which we've already seen. (This also means avoiding serializing JSON just to deserialize it again.) I was able to get rid of some variables which tracked how the schema was originally defined, because this recursion is taking care of noticing that.

        As I needed to, I removed some verbosity from the tests and removed some exception handling. It's very unhelpful when python tests catch exceptions, because they make it that much harder to track down the exact point of the failure. (An exception that propagates through a test is a test failure, so there's no need to separately mark the test as failed.) Printing extra information about what tests are running distracts from where the failures are occurring. I recommend the nose test runner (with flags --pdb --pdb-failure) for running the tests.

        I've added a test that triggered this in the first place.

        Show
        Philip Zeyliger added a comment - I believe I've fixed this. I implemented a Schema.to_json(names) method, which recursively serializes schema objects to JSON-compatible structures, avoiding re-serializing schemas which we've already seen. (This also means avoiding serializing JSON just to deserialize it again.) I was able to get rid of some variables which tracked how the schema was originally defined, because this recursion is taking care of noticing that. As I needed to, I removed some verbosity from the tests and removed some exception handling. It's very unhelpful when python tests catch exceptions, because they make it that much harder to track down the exact point of the failure. (An exception that propagates through a test is a test failure, so there's no need to separately mark the test as failed.) Printing extra information about what tests are running distracts from where the failures are occurring. I recommend the nose test runner (with flags --pdb --pdb-failure) for running the tests. I've added a test that triggered this in the first place.
        Hide
        Philip Zeyliger added a comment -
        Show
        Philip Zeyliger added a comment - On reviewboard at https://review.cloudera.org/r/706/
        Hide
        Philip Zeyliger added a comment -

        I committed this. Jeff reviewed it:

        Ship it!
        
        
        Okay.
        
        - Jeff
        
        Show
        Philip Zeyliger added a comment - I committed this. Jeff reviewed it: Ship it! Okay. - Jeff

          People

          • Assignee:
            Philip Zeyliger
            Reporter:
            Philip Zeyliger
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development