[AVRO-1953] ArrayIndexOutOfBoundsException in org.apache.avro.io.parsing.Symbol$Alternative.getSymbol - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.7.4
Fix Version/s: None
Component/s: java
Labels:
None

Description

We are facing an issue when Avro MapReducer cannot process the avro file in the reducer.

Here is the schema of our data:

{
"namespace" : "our package name",
"type" : "record",
"name" : "Lists",
"fields" : [

{"name" : "account_id", "type" : "long"}

,

{"name" : "list_id", "type" : "string"}

,

{"name" : "sequence_id", "type" : ["int", "null"]}

,

{"name" : "name", "type" : ["string", "null"]}

,

{"name" : "state", "type" : ["string", "null"]}

,

{"name" : "description", "type" : ["string", "null"]}

,

{"name" : "dynamic_filtered_list", "type" : ["int", "null"]}

,

{"name" : "filter_criteria", "type" : ["string", "null"]}

,

{"name" : "created_at", "type" : ["long", "null"]}

,

{"name" : "updated_at", "type" : ["long", "null"]}

,

{"name" : "deleted_at", "type" : ["long", "null"]}

,

{"name" : "favorite", "type" : ["int", "null"]}

,

{"name" : "delta", "type" : ["boolean", "null"]}

,
{
"name" : "list_memberships", "type" : {
"type" : "array", "items" : {
"name" : "ListMembership", "type" : "record",
"fields" : [

{"name" : "channel_id", "type" : "string"}

,

{"name" : "created_at", "type" : ["long", "null"]}

,

{"name" : "created_source", "type" : ["string", "null"]}

,

{"name" : "deleted_at", "type" : ["long", "null"]}

,

{"name" : "sequence_id", "type" : ["int", "null"]}

]
}
}
}
]
}

Our MapReduce job is to get the delta of the above dataset, and use our merge logic to merge the latest change into the dataset.

The whole MR job runs daily, and work fine for 18 months. During this time, we saw 2 times the merge MapReduce job failed with following error (In the reducer stage, which means the Avro data being read successfully, and send to the reducers, which we sort the data based on the key and timestamp, so the delta can be merged in the reducer side):

java.lang.ArrayIndexOutOfBoundsException at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) at org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108) at org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(AccessController.java:366) at javax.security.auth.Subject.doAs(Subject.java:572) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapred.Child.main(Child.java:249)

The MapReducer job will fail eventually in the reducer stage. I don't think our data is corrupted, as they are read fine in the map stage. Every time we got this error, we have to get the whole huge dataset from the source, then rebuilt the AVRO, and start building merge again daily, until after several months, then face this issue due to whatever reason we don't know yet.

ArrayIndexOutOfBoundsException in org.apache.avro.io.parsing.Symbol$Alternative.getSymbol

Details

Description

Attachments

Activity

People

Dates