Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-24

FSDataOutputStream should flush last partial CRC chunk

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      The FSDataOutputSteam.flush() api is supposed to flush all data to the underlying stream. However, for LocalFileSystem, the flush APi does not flush the last partial CRC chunk.

      One solution is described in HADOOP-2657: We should change FSOutputStream to implement Seekable, and have the default implementation of seek throw an IOException, then use this in CheckSumFileSystem to rewind and overwrite the checksum. Then folks will only fail if they attempt to write more data after they've flushed on a ChecksumFileSystem that doesn't support seek. I don't think we will have any filesystems that both extend CheckSumFileSystem and can't support seek. Only LocalFileSystem currently extends CheckSumFileSystem, and it does support seek. So flush() shouldn't ever fail for existing FileSystem's, but seek() will fail for most output streams (probably all except local).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dhruba Dhruba Borthakur
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: