[AVRO-986] Avro files generated from avro-c dont work with the Java mapred implementation. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6.2
Component/s: c, java
Labels:
- c
- hadoop
- java
- mapreduce
Environment:

avro-c 1.6.2-SNAPSHOT
avro-java 1.6.2-SNAPSHOT
hadoop 0.20.2

Tags:
mapreduce hadoop avro sync

Description

When a file generated from the Avro-C implementation is fed into Hadoop, it will fail with "Block size invalid or too large for this implementation: -49".

This is caused by the sync marker, namely the one that Avro-C puts into the header...

The org.apache.avro.mapred.AvroRecordReader uses a FileSplit object to work out where it should read from, but this class is not particularly smart, it just divides the file up into equal size chunks, the first being with position 0.

So org.apache.avro.mapred.AvroRecordReader gets 0 as the start of its chunk, and calls

AvroRecordReader.java

reader.sync(split.getStart());   // sync to start

Then the org.apache.avro.file.DataFileReader::seek() goes to 0, then searches for a sync marker....
It encounters one at position 32, the one in the header metadata map, "avro.sync"

No other implementations add the sync marker in the metadata map, and none read it from there, not even the C version.

I suggest we remove this from the header as the simplest solution.
Another solution would be to create an AvroFileSplit class in mapred that knows where the blocks are, and provides the correct locations in the first place.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

0001-avromod-utility.patch
27/Dec/11 15:06
7 kB
Douglas Creager
0001-Remove-sync-marker-from-metadata-in-header.patch
22/Dec/11 04:27
1 kB
Michael Cooper
AVRO-986-java.patch
20/Jan/12 18:02
2 kB
Doug Cutting
AVRO-986-java.patch
22/Dec/11 22:31
0.7 kB
Doug Cutting
quickstop.db
27/Dec/11 14:30
22 kB
Douglas Creager

Activity

People

Assignee:: Unassigned

Reporter:: Michael Cooper

Votes:: 1 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 22/Dec/11 04:08

Updated:: 15/Feb/12 00:46

Resolved:: 26/Jan/12 16:48