Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-793

A strange problem when I am trying to read avro record with a subset of the schema.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.5.0
    • 1.5.1
    • java
    • Avro1.5,Windows xp/Ubuntu 10.0.4

    • Reviewed

    Description

      Hi, all. When I am trying to read avro file with a subset of that schema(because I do not need all the details).I meet a strange problem.
      1.I write data using this schema:
      {
      "name": "relation",
      "type": "record",
      "fields": [

      { "name": "timestamp", "type": "long" }

      ,
      {
      "name": "type",
      "type": {
      "type": "map",
      "values":{
      "type" : "array",
      "items": {
      "type":"record",
      "name":"sdf",
      "fields": [
      {
      "name": "device",
      "type": "string"
      },
      {
      "name": "children",
      "type":

      { "type": "array", "items": "string" } } ] } } }

      }
      ]
      }

      2.Here is a JSONObject for that schema.
      {
      "timestamp":1234567890,
      "type":

      { "WMA":[ { "device":"WMA1", "children":["WMB1","WMB2"] }, { "device":"WMA2", "children":["WMB1","WMB2"] } ] } }

      3.I write that record succefully.And it is okay if I use this schema for reading:
      {
      "name": "relation",
      "type": "record",
      "fields": [

      { "name": "timestamp", "type": "long" }

      ,
      {
      "name": "type",
      "type": {
      "type": "map",
      "values":{
      "type" : "array",
      "items": {
      "type":"record",
      "name":"sdf",
      "fields": [
      {
      "name": "children",
      "type":

      { "type": "array", "items": "string" } } ] } } }

      }
      ]
      }

      the result is :
      {
      "timestamp":1234567890,
      "type":

      { "WMA":[ { "children":["WMB1","WMB2"] }, { "children":["WMB1","WMB2"] } ] } }

      4.But if i want to igonre the "children" part instead of "device", I use this schema for reading:
      {
      "name": "relation",
      "type": "record",
      "fields": [

      { "name": "timestamp", "type": "long" }

      ,
      {
      "name": "type",
      "type": {
      "type": "map",
      "values":{
      "type" : "array",
      "items":

      { "type":"record", "name":"sdf", "fields": [ { "name": "device", "type": "string" } ] } } }

      }
      ]
      }

      Unfortunately,I get exception:

      java.lang.ArrayIndexOutOfBoundsException: -8
      cause:java.lang.ArrayIndexOutOfBoundsException
      at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:122)
      at org.apache.avro.io.BinaryDecoder.skipString(BinaryDecoder.java:262)
      at org.apache.avro.io.ValidatingDecoder.skipString(ValidatingDecoder.java:113)
      at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:60)
      at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
      at org.apache.avro.io.parsing.SkipParser.skipRepeater(SkipParser.java:83)
      at org.apache.avro.io.ValidatingDecoder.skipArray(ValidatingDecoder.java:195)
      at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:70)
      at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
      at org.apache.avro.io.parsing.SkipParser.skipSymbol(SkipParser.java:93)
      at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:226)
      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
      at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
      at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:162)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
      at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:196)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)
      at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:233)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
      at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:167)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
      at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
      at org.apache.avro.file.DataFileStream.next(DataFileStream.java:223)
      at AvroUtilTest.read(AvroUtilTest.java:77)
      at AvroUtilTest.main(AvroUtilTest.java:61)

      As Scott Carey said,I did like this and it worked.How to fix this bug?
      Scott Carey:
      2: If you change the schema you write with by making reversing the order of the fields of "sdf" (array, then string), are the results the same?

      Attachments

        1. AVRO-793.patch
          0.5 kB
          Thiruvalluvan M. G.
        2. AVRO-793-test.patch
          1 kB
          Thiruvalluvan M. G.

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            thiru_mg Thiruvalluvan M. G.
            ygnhzeus Yingzhong Xu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment