Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.7.5
-
None
-
None
Description
I have a 340 MB avro data file that contains records sorted and identified by unique id (duplicate records exists). At the beginning of every unique record a synchronization point is created with DataFileWriter.sync(). (I cannot or do not want to save the sync points and i do not want to use SortedKeyValueFile as output format for M/R job)
There are at-least 25k synchronization points in a 340 MB file.
Ex:
Marker1_RecordA1_RecordA2_RecordA3_Marker2_RecordB1_RecordB2
As records are sorted and marked, for efficient retrieval, binary search is performed. Most of the times the search is successful, at times the code throws the following exception
------
org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync! at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210
------
I note down the position that was used to invoke fileReader.sync(mid); and catch AvroRuntimeException, close and open the file and sync(mid) i do not see exception.
Why should Avro throw exception before and not later ?
1.7.5v of library is throwing this error. Raising a major defect, adjust the priority at your convenience.