Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1326

Files written with bzip2 codec cannot be read

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.7.4
    • Fix Version/s: 1.7.5
    • Component/s: java
    • Labels:
      None

      Description

      When attempting to read a file written using the bzip2 codec for compression, the following exception is thrown upon completion of the first encoded block:

      Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt
      at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
      at BzipTests.main(BzipTests.java:28)
      Caused by: java.io.IOException: Block read partially, the data may be corrupt
      at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194)
      ... 1 more

      An inspection of BZip2Codec indicates the root cause is in the compress() method. The entire supplied ByteBuffer is compressed, not just the valid portion of the buffer. On decompress, the resultant length is then larger than the recorded uncompressed block size.

      On line 51:
      outputStream.write(uncompressedData.array());

      should be:
      outputStream.write(uncompressedData.array(), uncompressedData.position(), uncompressedData.remaining());

        Attachments

        1. AVRO-1326.patch
          1 kB
          Doug Cutting
        2. BzipTest.java
          1 kB
          Kevin Irwin

          Activity

            People

            • Assignee:
              cutting Doug Cutting
              Reporter:
              kirwin Kevin Irwin
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: