Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-2468

Fix broken data interoperability on the Perl bindings

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.9.2
    • interop, perl
    • None

    Description

      I found some data interop problems on the Perl bindings.

      1. They fail to parse a schema if there's an array/map/union which contains named types with a simple (not fully-qualified) name in it. For example, they can't parse share/test/schemas/interop.avsc or share/schemas/org/apache/avro/data/Json.avsc, because they have a named type called "Node" or "Json" respectively in arrays/maps. This seems because the parser doesn't take namespace into consideration in parsing array/map/union.

      $ cd lang/perl
      $ perl -Ilib -de 1
      
      (snip)
      
        DB<1> open FH, '../../share/test/schemas/interop.avsc'; local $/ = undef; $s = <FH>; close FH; print $s
      {"type": "record", "name":"Interop", "namespace": "org.apache.avro",
        "fields": [
            {"name": "intField", "type": "int"},
            {"name": "longField", "type": "long"},
            {"name": "stringField", "type": "string"},
            {"name": "boolField", "type": "boolean"},
            {"name": "floatField", "type": "float"},
            {"name": "doubleField", "type": "double"},
            {"name": "bytesField", "type": "bytes"},
            {"name": "nullField", "type": "null"},
            {"name": "arrayField", "type": {"type": "array", "items": "double"}},
            {"name": "mapField", "type":
             {"type": "map", "values":
              {"type": "record", "name": "Foo",
               "fields": [{"name": "label", "type": "string"}]}}},
            {"name": "unionField", "type":
             ["boolean", "double", {"type": "array", "items": "bytes"}]},
            {"name": "enumField", "type":
             {"type": "enum", "name": "Kind", "symbols": ["A","B","C"]}},
            {"name": "fixedField", "type":
             {"type": "fixed", "name": "MD5", "size": 16}},
            {"name": "recordField", "type":
             {"type": "record", "name": "Node",
              "fields": [
                  {"name": "label", "type": "string"},
                  {"name": "children", "type": {"type": "array", "items": "Node"}}]}}
        ]
      }
      
        DB<2> use Avro::Schema; Avro::Schema->parse($s)
      Not a primitive type Node at lib/Avro/Schema.pm line 257.
      
      

      2. They encode the size for a fixed type as a string rather than a number, so other language bindings fail to parse it.

      $ cd lang/perl 
      $ perl -Ilib -de 1
      
      (snip)
      
        DB<1> use Avro::Schema; $s = Avro::Schema->parse('{"type": "fixed", "size": 16, "name": "md5"}')
      
        DB<2> open($fh, '>/tmp/output')
      
        DB<3> use Avro::DataFileWriter; $w = Avro::DataFileWriter->new(fh => $fh, writer_schema => $s)
      
        DB<4> $w->print('0123456789abcdef')
      
        DB<5> $w->close
      
      
      $ ipython
      
      (snip)
      
      In [1]: from avro.datafile import DataFileReader
      
      In [2]: from avro.io import DatumReader
      
      In [3]: DataFileReader(datum_reader=DatumReader(), reader=open("/tmp/output"))
      ---------------------------------------------------------------------------
      AvroException                             Traceback (most recent call last)
      <ipython-input-3-13da25c7d572> in <module>()
      ----> 1 DataFileReader(datum_reader=DatumReader(), reader=open("/tmp/output"))
      
      /home/sekikn/repo/avro/lang/py/src/avro/datafile.pyc in __init__(self, reader, datum_reader)
          255     # get ready to read
          256     self._block_count = 0
      --> 257     self.datum_reader.writers_schema = schema.parse(self.get_meta(SCHEMA_KEY))
          258 
          259   def __enter__(self):
      
      /home/sekikn/repo/avro/lang/py/src/avro/schema.pyc in parse(json_string)
          984 
          985   # construct the Avro Schema object
      --> 986   return make_avsc_object(json_data, names)
      
      /home/sekikn/repo/avro/lang/py/src/avro/schema.pyc in make_avsc_object(json_data, names)
          931           scale = 0 if json_data.get('scale') is None else json_data.get('scale')
          932           return FixedDecimalSchema(size, name, precision, scale, namespace, names, other_props)
      --> 933         return FixedSchema(name, namespace, size, names, other_props)
          934       elif type == 'enum':
          935         symbols = json_data.get('symbols')
      
      /home/sekikn/repo/avro/lang/py/src/avro/schema.pyc in __init__(self, name, namespace, size, names, other_props)
          482     if not isinstance(size, int) or size < 0:
          483       fail_msg = 'Fixed Schema requires a valid positive integer for size property.'
      --> 484       raise AvroException(fail_msg)
          485 
          486     # Call parent ctor
      
      AvroException: Fixed Schema requires a valid positive integer for size property.
      
      $ strings /tmp/output 
      avro.schemaR{"size":"16","type":"fixed","name":"md5"}
      
      (snip)
      

      Attachments

        Issue Links

          Activity

            People

              sekikn Kengo Seki
              sekikn Kengo Seki
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: