Avro
  1. Avro
  2. AVRO-793

A strange problem when I am trying to read avro record with a subset of the schema.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.5.0
    • Fix Version/s: 1.5.1
    • Component/s: java
    • Environment:

      Avro1.5,Windows xp/Ubuntu 10.0.4

    • Hadoop Flags:
      Reviewed

      Description

      Hi, all. When I am trying to read avro file with a subset of that schema(because I do not need all the details).I meet a strange problem.
      1.I write data using this schema:
      {
      "name": "relation",
      "type": "record",
      "fields": [

      { "name": "timestamp", "type": "long" }

      ,
      {
      "name": "type",
      "type": {
      "type": "map",
      "values":{
      "type" : "array",
      "items": {
      "type":"record",
      "name":"sdf",
      "fields": [

      { "name": "device", "type": "string" }

      ,
      {
      "name": "children",
      "type":

      { "type": "array", "items": "string" }

      }
      ]
      }
      }
      }
      }
      ]
      }

      2.Here is a JSONObject for that schema.
      {
      "timestamp":1234567890,
      "type":{
      "WMA":[

      { "device":"WMA1", "children":["WMB1","WMB2"] }

      ,

      { "device":"WMA2", "children":["WMB1","WMB2"] }

      ]
      }

      }

      3.I write that record succefully.And it is okay if I use this schema for reading:
      {
      "name": "relation",
      "type": "record",
      "fields": [

      { "name": "timestamp", "type": "long" }

      ,
      {
      "name": "type",
      "type": {
      "type": "map",
      "values":{
      "type" : "array",
      "items": {
      "type":"record",
      "name":"sdf",
      "fields": [
      {
      "name": "children",
      "type":

      { "type": "array", "items": "string" }

      }
      ]
      }
      }
      }
      }
      ]
      }

      the result is :
      {
      "timestamp":1234567890,
      "type":{
      "WMA":[

      { "children":["WMB1","WMB2"] }

      ,

      { "children":["WMB1","WMB2"] }

      ]
      }

      }

      4.But if i want to igonre the "children" part instead of "device", I use this schema for reading:
      {
      "name": "relation",
      "type": "record",
      "fields": [

      { "name": "timestamp", "type": "long" }

      ,
      {
      "name": "type",
      "type": {
      "type": "map",
      "values":{
      "type" : "array",
      "items": {
      "type":"record",
      "name":"sdf",
      "fields": [

      { "name": "device", "type": "string" }

      ]
      }
      }
      }
      }
      ]
      }

      Unfortunately,I get exception:

      java.lang.ArrayIndexOutOfBoundsException: -8
      cause:java.lang.ArrayIndexOutOfBoundsException
      at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:122)
      at org.apache.avro.io.BinaryDecoder.skipString(BinaryDecoder.java:262)
      at org.apache.avro.io.ValidatingDecoder.skipString(ValidatingDecoder.java:113)
      at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:60)
      at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
      at org.apache.avro.io.parsing.SkipParser.skipRepeater(SkipParser.java:83)
      at org.apache.avro.io.ValidatingDecoder.skipArray(ValidatingDecoder.java:195)
      at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:70)
      at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
      at org.apache.avro.io.parsing.SkipParser.skipSymbol(SkipParser.java:93)
      at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:226)
      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
      at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
      at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:162)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
      at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:196)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)
      at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:233)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
      at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:167)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
      at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
      at org.apache.avro.file.DataFileStream.next(DataFileStream.java:223)
      at AvroUtilTest.read(AvroUtilTest.java:77)
      at AvroUtilTest.main(AvroUtilTest.java:61)

      As Scott Carey said,I did like this and it worked.How to fix this bug?
      Scott Carey:
      2: If you change the schema you write with by making reversing the order of the fields of "sdf" (array, then string), are the results the same?

      1. AVRO-793.patch
        0.5 kB
        Thiruvalluvan M. G.
      2. AVRO-793-test.patch
        1 kB
        Thiruvalluvan M. G.

        Activity

        Hide
        Scott Carey added a comment -

        this is a bug in the resolver.

        Show
        Scott Carey added a comment - this is a bug in the resolver.
        Hide
        Thiruvalluvan M. G. added a comment -

        Very subtle bug. If there is an array needs to be skipped and that happens to be the last field of a record, and the record is contained in an outer array, it does not get skipped properly.

        The test patch has the test that catches the bug and the main patch has the solution.

        Show
        Thiruvalluvan M. G. added a comment - Very subtle bug. If there is an array needs to be skipped and that happens to be the last field of a record, and the record is contained in an outer array, it does not get skipped properly. The test patch has the test that catches the bug and the main patch has the solution.
        Hide
        Doug Cutting added a comment -

        I committed this. Thanks, Thiru!

        Show
        Doug Cutting added a comment - I committed this. Thanks, Thiru!

          People

          • Assignee:
            Thiruvalluvan M. G.
            Reporter:
            Yingzhong Xu
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development