Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-792

map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Cannot Reproduce
    • 1.5.0, 1.5.1
    • None
    • java
    • None
    • Mac with VMWare running Linux training-vm-Ubuntu

    Description

      We have an avro map/reduce job used to be working with avro 1.4, but broken with avro 1.5. The M/R job with avro 1.5 worked fine under our debugging environment, but broken when we moved to a real cluster. At one instance f testing, the job had 23 reducers. Four of them succeeded and the rest failed because of the ArrayIndexOutOfBoundsException generated. Here are two instances of the stack traces:

      =================================================================================
      java.lang.ArrayIndexOutOfBoundsException: -1576799025
      at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
      at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
      at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
      at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:232)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
      at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
      at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
      at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
      at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
      at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
      at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
      at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
      at org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
      at com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:46)
      at com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:1)
      at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
      at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
      at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
      at org.apache.hadoop.mapred.Child.main(Child.java:234)

      =====================================================================================================

      java.lang.ArrayIndexOutOfBoundsException: 40
      at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
      at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
      at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
      at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
      at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
      at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
      at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
      at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
      at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
      at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
      at org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
      at com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:74)
      at com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:1)
      at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
      at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
      at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
      at org.apache.hadoop.mapred.Child.main(Child.java:234)
      =====================================================================================================

      The signature of our map() is:

      public void map(Utf8 input, AvroCollector<Pair<Utf8, GenericRecord>> collector, Reporter reporter) throws IOException;

      and reduce() is:

      public void reduce(Utf8 key, Iterable<GenericRecord> values, AvroCollector<GenericRecord> collector, Reporter reporter) throws IOException;

      All the GenericRecords are of the same schema.

      There are many changes in the area of serialization/de-serailization between avro 1.4 and 1.5, but could not figure out why the exceptions were generated.

      Attachments

        1. AVRO-792.patch
          5 kB
          Thiruvalluvan M. G.
        2. AVRO-792-2.patch
          5 kB
          Thiruvalluvan M. G.
        3. AVRO-792-3.patch
          5 kB
          Thiruvalluvan M. G.
        4. part-00000.avro
          75 kB
          ey-chih chow
        5. part-00000.avro
          75 kB
          ey-chih chow
        6. part-00001.avro
          110 kB
          ey-chih chow
        7. part-00001.avro
          110 kB
          ey-chih chow

        Activity

          People

            Unassigned Unassigned
            eychih ey-chih chow
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 504h
                504h
                Remaining:
                Remaining Estimate - 504h
                504h
                Logged:
                Time Spent - Not Specified
                Not Specified