Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1060 Append/flush should support concurrent "tailer" use case
  3. HDFS-1057

Concurrent readers hit ChecksumExceptions if following a writer to very end of file

    Details

    • Hadoop Flags:
      Reviewed

      Description

      In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable.

      1. hdfs-1057-trunk-6.txt
        29 kB
        sam rash
      2. hdfs-1057-trunk-5.txt
        30 kB
        sam rash
      3. hdfs-1057-trunk-4.txt
        29 kB
        sam rash
      4. hdfs-1057-trunk-3.txt
        25 kB
        sam rash
      5. hdfs-1057-trunk-2.txt
        26 kB
        sam rash
      6. hdfs-1057-trunk-1.txt
        27 kB
        sam rash
      7. HDFS-1057-0.20-append.patch
        35 kB
        Nicolas Spiegelberg
      8. HDFS-1057.20-security.1.patch
        34 kB
        Jitendra Nath Pandey
      9. conurrent-reader-patch-3.txt
        34 kB
        sam rash
      10. conurrent-reader-patch-2.txt
        31 kB
        sam rash
      11. conurrent-reader-patch-1.txt
        30 kB
        sam rash

        Issue Links

          Activity

          Todd Lipcon created issue -
          Todd Lipcon made changes -
          Field Original Value New Value
          Summary BlockReceiver records block length in replicaInfo before flushing Concurrent readers hit ChecksumExceptions if following a writer to very end of file
          Description In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable.
          Todd Lipcon made changes -
          Parent HDFS-1060 [ 12459815 ]
          Issue Type Bug [ 1 ] Sub-task [ 7 ]
          Todd Lipcon made changes -
          Link This issue is related to HDFS-1103 [ HDFS-1103 ]
          Hairong Kuang made changes -
          Priority Critical [ 2 ] Blocker [ 1 ]
          dhruba borthakur made changes -
          Assignee sam rash [ rash37 ]
          sam rash made changes -
          Attachment conurrent-reader-patch-1.txt [ 12443217 ]
          sam rash made changes -
          Attachment conurrent-reader-patch-1.txt [ 12443220 ]
          sam rash made changes -
          Attachment conurrent-reader-patch-1.txt [ 12443217 ]
          sam rash made changes -
          Attachment conurrent-reader-patch-2.txt [ 12443362 ]
          sam rash made changes -
          Attachment conurrent-reader-patch-3.txt [ 12443524 ]
          sam rash made changes -
          Attachment hdfs-1057-trunk-1.txt [ 12445829 ]
          Nicolas Spiegelberg made changes -
          Affects Version/s 0.20-append [ 12315103 ]
          sam rash made changes -
          Attachment hdfs-1057-trunk-2.txt [ 12446449 ]
          sam rash made changes -
          Attachment hdfs-1057-trunk-3.txt [ 12446524 ]
          dhruba borthakur made changes -
          Fix Version/s 0.20-append [ 12315103 ]
          sam rash made changes -
          Attachment hdfs-1057-trunk-4.txt [ 12447774 ]
          Nicolas Spiegelberg made changes -
          Attachment HDFS-1057-0.20-append.patch [ 12447982 ]
          sam rash made changes -
          Attachment hdfs-1057-trunk-5.txt [ 12448081 ]
          Hairong Kuang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hadoop Flags [Reviewed]
          sam rash made changes -
          Attachment hdfs-1057-trunk-6.txt [ 12448323 ]
          sam rash made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          sam rash made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hairong Kuang made changes -
          Fix Version/s 0.21.0 [ 12314046 ]
          Fix Version/s 0.22.0 [ 12314241 ]
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to HDFS-1310 [ HDFS-1310 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to HDFS-1679 [ HDFS-1679 ]
          Matt Foley made changes -
          Link This issue is related to HDFS-1401 [ HDFS-1401 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to HDFS-1885 [ HDFS-1885 ]
          Todd Lipcon made changes -
          Link This issue relates to HADOOP-7146 [ HADOOP-7146 ]
          Jitendra Nath Pandey made changes -
          Attachment HDFS-1057.20-security.1.patch [ 12492813 ]
          Suresh Srinivas made changes -
          Fix Version/s 0.20.205.0 [ 12316392 ]
          Jeff Hammerbacher made changes -
          Link This issue relates to HDFS-3719 [ HDFS-3719 ]

            People

            • Assignee:
              sam rash
              Reporter:
              Todd Lipcon
            • Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development