Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-2109

Reset buffers in case of IOException

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.8.2
    • Fix Version/s: 1.7.8, 1.9.0, 1.8.3
    • Component/s: java
    • Labels:
      None

      Description

      In case of an IOException is thrown out from DataFileWriter.writeBlock the buffer and blockCount are not reset therefore duplicated data is written out when close/flush.

      This is actually a conceptual question whether we should reset the buffer or not in case of an exception. In case of an exception occurs during writing the file we shall expect that the file will be corrupt. So, the possible duplication of data shall not matter.
      In the other hand if the file is already corrupt why would we try to write anything again at file close?

      This issue comes from a Flume issue where the HDFS wait thread is interrupted because of a timeout during writing an Avro file. The actual block is properly written already but because of the IOException caused by the thread interrupt we invoke close() on the writer which writes the block again with some other stuff (maybe duplicated sync marker) that makes the file corrupt.

      Sean Busbey, Nandor Kollar, Zoltan Ivanfi, any thoughts?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                gszadovszky Gabor Szadovszky
                Reporter:
                gszadovszky Gabor Szadovszky
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: