Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1529

Incorrect handling of interrupts in waitForAckedSeqno can cause deadlock

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: hdfs-client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In HDFS-895 the handling of interrupts during hflush/close was changed to preserve interrupt status. This ends up creating an infinite loop in waitForAckedSeqno if the waiting thread gets interrupted, since Object.wait() has a strange semantic that it doesn't give up the lock even momentarily if the thread is already in interrupted state at the beginning of the call.

      We should decide what the correct behavior is here - if a thread is interrupted while it's calling hflush() or close() should we (a) throw an exception, perhaps InterruptedIOException (b) ignore, or (c) wait for the flush to finish but preserve interrupt status on exit?

      1. Test.java
        0.8 kB
        Todd Lipcon
      2. hdfs-1529.txt
        6 kB
        Todd Lipcon
      3. hdfs-1529.txt
        8 kB
        Todd Lipcon
      4. hdfs-1529.txt
        8 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        Here's a little test program I wrote to show the weird semantics of Object.wait() not letting another thread lock the object if current thread has interrupt flag set.

        Show
        Todd Lipcon added a comment - Here's a little test program I wrote to show the weird semantics of Object.wait() not letting another thread lock the object if current thread has interrupt flag set.
        Hide
        Todd Lipcon added a comment -

        Here's a potential fix for this. close() and hflush() now may throw InterruptedIOException if the thread has interrupt status. It's a bit tricky because they won't always throw it - if they weren't going to block, they will return immediately (with interrupt status preserved)

        Show
        Todd Lipcon added a comment - Here's a potential fix for this. close() and hflush() now may throw InterruptedIOException if the thread has interrupt status. It's a bit tricky because they won't always throw it - if they weren't going to block, they will return immediately (with interrupt status preserved)
        Hide
        Todd Lipcon added a comment -

        Fixed test cases which were missing a finally

        { cluster.shutdown() }

        Show
        Todd Lipcon added a comment - Fixed test cases which were missing a finally { cluster.shutdown() }
        Hide
        Todd Lipcon added a comment -

        I ran unit tests, only got known failures

        Show
        Todd Lipcon added a comment - I ran unit tests, only got known failures
        Hide
        Hairong Kuang added a comment -

        1. waitAndQueueCurrentPacket handles InterruptedException differently. Which inconsistent state you try to avoid?
        2. I prefer to set isClosed to be true if close() gets interrupted. This is safer to avoid the possible errors like buffered data get pushed to pipeline twice.

        Show
        Hairong Kuang added a comment - 1. waitAndQueueCurrentPacket handles InterruptedException differently. Which inconsistent state you try to avoid? 2. I prefer to set isClosed to be true if close() gets interrupted. This is safer to avoid the possible errors like buffered data get pushed to pipeline twice.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12465653/hdfs-1529.txt
        against trunk revision 1051644.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The patch appears to cause tar ant target to fail.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestHDFSCLI
        org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
        org.apache.hadoop.hdfs.TestFileConcurrentReader
        org.apache.hadoop.hdfs.TestHDFSTrash

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/12//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/12//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/12//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12465653/hdfs-1529.txt against trunk revision 1051644. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.namenode.TestStorageRestore org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestHDFSTrash -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/12//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/12//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/12//console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        Thanks for reviewing. I clarified the comment about the specific condition we're working around, and put close() back the way it was.

        Show
        Todd Lipcon added a comment - Thanks for reviewing. I clarified the comment about the specific condition we're working around, and put close() back the way it was.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12466846/hdfs-1529.txt
        against trunk revision 1051669.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.server.namenode.TestStorageRestore

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/41//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/41//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/41//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12466846/hdfs-1529.txt against trunk revision 1051669. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestStorageRestore -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/41//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/41//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/41//console This message is automatically generated.
        Hide
        Hairong Kuang added a comment -

        > even though we slightly overrun the MAX_PACKETS iength.
        Can you simply throwing away the current packet and throwing an InterruptedIOException? Since the write is interrupted, you do not need to guarantee that the packet gets delivered, right? By overruning the MAX_PACKETS length, you may make the client hit OOM and jeopardize all the queued packets.

        Show
        Hairong Kuang added a comment - > even though we slightly overrun the MAX_PACKETS iength. Can you simply throwing away the current packet and throwing an InterruptedIOException? Since the write is interrupted, you do not need to guarantee that the packet gets delivered, right? By overruning the MAX_PACKETS length, you may make the client hit OOM and jeopardize all the queued packets.
        Hide
        Todd Lipcon added a comment -

        We can't do that since it would screw up the state for other threads. Example:

        Thread A writes 512 bytes
        Writes B writes 512 bytes
        Thread A calls hflush, and gets interrupted, throwing InterruptedIOException and throwing away the packet including B's data

        Now B has never seen an exception, but its data was also lost.

        OOME seems very unlikely, we'd need to have many thousands of threads all hitting this very rare condition at the same time, and the buffer sizes we're talking about are fairly small.

        Show
        Todd Lipcon added a comment - We can't do that since it would screw up the state for other threads. Example: Thread A writes 512 bytes Writes B writes 512 bytes Thread A calls hflush, and gets interrupted, throwing InterruptedIOException and throwing away the packet including B's data Now B has never seen an exception, but its data was also lost. OOME seems very unlikely, we'd need to have many thousands of threads all hitting this very rare condition at the same time, and the buffer sizes we're talking about are fairly small.
        Hide
        Hairong Kuang added a comment -

        You have a valid case. We should not throw away current packet. But I am still concerned about the solution. Suppose we really use a buffer instead of a queue of packets and the buffer is full. Problematically you are not able to queue additional packet, right? Shall we check if the packet is full or not and does the queuing in the beginning of writeChunk before writing the chunk to the current packet?

        Show
        Hairong Kuang added a comment - You have a valid case. We should not throw away current packet. But I am still concerned about the solution. Suppose we really use a buffer instead of a queue of packets and the buffer is full. Problematically you are not able to queue additional packet, right? Shall we check if the packet is full or not and does the queuing in the beginning of writeChunk before writing the chunk to the current packet?
        Hide
        Todd Lipcon added a comment -

        Potentially, but now you're talking about a significant change to the invariants of DFSOutputStream, which seems a little more involved, and also would also require a lot of changes to the way hflush works. For example, if we left an hflush "partial chunk" packet in currentPacket then we'd have to do some tricky maneuvering to restore the correct state of the various offsets, right?

        Since interrupt is almost always used to get something to shut down, it seems unlikely we'd do it so many times quickly in a row that we'd overrun memory by allocating these 64KB buffers. Usually there would be one interrupt and then a close.

        Show
        Todd Lipcon added a comment - Potentially, but now you're talking about a significant change to the invariants of DFSOutputStream, which seems a little more involved, and also would also require a lot of changes to the way hflush works. For example, if we left an hflush "partial chunk" packet in currentPacket then we'd have to do some tricky maneuvering to restore the correct state of the various offsets, right? Since interrupt is almost always used to get something to shut down, it seems unlikely we'd do it so many times quickly in a row that we'd overrun memory by allocating these 64KB buffers. Usually there would be one interrupt and then a close.
        Hide
        Nigel Daley added a comment -

        Blocker for 0.22

        Show
        Nigel Daley added a comment - Blocker for 0.22
        Hide
        Todd Lipcon added a comment -

        Hairong, what do you think about the current fix for now, and if we see any issues we can redesign things as described above?

        As it stands this bug is responsible for a lot of hung hudson servers.

        Show
        Todd Lipcon added a comment - Hairong, what do you think about the current fix for now, and if we see any issues we can redesign things as described above? As it stands this bug is responsible for a lot of hung hudson servers.
        Hide
        Hairong Kuang added a comment -

        Took a look at the patch again. One more question, why the two methods handle InterruptedException differently? Could waitForAckedSeqno does the same as waitAndQueueCurrentPacke?

        Show
        Hairong Kuang added a comment - Took a look at the patch again. One more question, why the two methods handle InterruptedException differently? Could waitForAckedSeqno does the same as waitAndQueueCurrentPacke?
        Hide
        Hairong Kuang added a comment -

        +1. Let's commit this fix for now.

        Show
        Hairong Kuang added a comment - +1. Let's commit this fix for now.
        Hide
        Todd Lipcon added a comment -

        Hi Hairong. The difference between the handling in the two different methods is this:

        In waitForAckedSeqno, the user thread is just blocked on an hflush() or flush before close - In this case throwing an exception will not leave the stream in an inconsistent state, and matches what one might expect: interrupting a thread blocked on IO will throw cause the thread to throw an exception. In waitAndQueueCurrentPacket() if we return early from that wait without an exception, we still leave the stream in a "correct" state. Keeping the interruption state flagged means the user will see the exception on the next operation.

        Since you gave +1 I'll commit as is, and if we want to clean this up we can do it separately.

        Show
        Todd Lipcon added a comment - Hi Hairong. The difference between the handling in the two different methods is this: In waitForAckedSeqno, the user thread is just blocked on an hflush() or flush before close - In this case throwing an exception will not leave the stream in an inconsistent state, and matches what one might expect: interrupting a thread blocked on IO will throw cause the thread to throw an exception. In waitAndQueueCurrentPacket() if we return early from that wait without an exception, we still leave the stream in a "correct" state. Keeping the interruption state flagged means the user will see the exception on the next operation. Since you gave +1 I'll commit as is, and if we want to clean this up we can do it separately.
        Hide
        Todd Lipcon added a comment -

        Committed to branch and trunk

        Show
        Todd Lipcon added a comment - Committed to branch and trunk
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #539 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #539 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/ )

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development