Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11333

Fix deadlock in DomainSocketWatcher when the notification pipe is full

    Details

    • Target Version/s:

      Description

      I found some of our DataNodes will run "exceeds the limit of concurrent xciever", the limit is 4K.

      After check the stack, I suspect that org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by DomainSocketWatcher.kick stuck:

      "DataXceiver for client unix:/var/run/hadoop-hdfs/dn Waiting for operation #1" daemon prio=10 tid=0x00007f55c5576000 nid=0x385d waiting on condition [0x00007f558d5d4000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x0000000740df9c90> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286)
        at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)

        "DataXceiver for client unix:/var/run/hadoop-hdfs/dn Waiting for operation #1" daemon prio=10 tid=0x00007f7de034c800 nid=0x7b7 runnable [0x00007f7db06c5000]
        java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
        at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
        at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
        at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
        at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
        at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
        at java.lang.Thread.run(Thread.java:745)

      "DataXceiver for client unix:/var/run/hadoop-hdfs/dn Waiting for operation #1" daemon prio=10 tid=0x00007f55c5574000 nid=0x377a waiting on condition [0x00007f558d7d6000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x0000000740df9cb0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306)
        at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
        at java.lang.Thread.run(Thread.java:745)

      "Thread-163852" daemon prio=10 tid=0x00007f55c811c800 nid=0x6757 runnable [0x00007f55aef6e000]
      java.lang.Thread.State: RUNNABLE
      at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
      at org.apache.hadoop.net.unix.DomainSocketWatcher.access$800(DomainSocketWatcher.java:52)
      at org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:457)
      at java.lang.Thread.run(Thread.java:745)

      1. HADOOP-11333-1.patch
        2 kB
        yunjiong zhao
      2. HADOOP-11333.patch
        1 kB
        yunjiong zhao
      3. 11241025
        1.97 MB
        yunjiong zhao
      4. 11241023
        1.96 MB
        yunjiong zhao
      5. 11241021
        1.94 MB
        yunjiong zhao

        Issue Links

          Activity

          Hide
          zhaoyunjiong yunjiong zhao added a comment -

          Upload more stack trace files.

          Show
          zhaoyunjiong yunjiong zhao added a comment - Upload more stack trace files.
          Hide
          cmccabe Colin P. McCabe added a comment - - edited

          Sorry, I'm not clear on what the problem is.

          Show
          cmccabe Colin P. McCabe added a comment - - edited Sorry, I'm not clear on what the problem is.
          Hide
          zhaoyunjiong yunjiong zhao added a comment -

          The previous description is not right.
          The stuck thread happened at org.apache.hadoop.net.unix.DomainSocket.writeArray0 as below shows.

          $ grep -B2 -A10 DomainSocket.writeArray 1124102*
          11241021-"DataXceiver for client unix:/var/run/hadoop-hdfs/dn Waiting for operation #1" daemon prio=10 tid=0x00007f7de034c800 nid=0x7b7 runnable [0x00007f7db06c5000]
          11241021- java.lang.Thread.State: RUNNABLE
          11241021: at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
          11241021- at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
          11241021- at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
          11241021- at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
          11241021- at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
          11241021- at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
          11241021- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
          11241021- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
          11241021- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
          11241021- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
          11241021- at java.lang.Thread.run(Thread.java:745)

          --
          11241023-"DataXceiver for client unix:/var/run/hadoop-hdfs/dn Waiting for operation #1" daemon prio=10 tid=0x00007f7de034c800 nid=0x7b7 runnable [0x00007f7db06c5000]
          11241023- java.lang.Thread.State: RUNNABLE
          11241023: at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
          11241023- at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
          11241023- at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
          11241023- at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
          11241023- at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
          11241023- at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
          11241023- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
          11241023- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
          11241023- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
          11241023- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
          11241023- at java.lang.Thread.run(Thread.java:745)

          --
          11241025-"DataXceiver for client unix:/var/run/hadoop-hdfs/dn Waiting for operation #1" daemon prio=10 tid=0x00007f7de034c800 nid=0x7b7 runnable [0x00007f7db06c5000]
          11241025- java.lang.Thread.State: RUNNABLE
          11241025: at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
          11241025- at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
          11241025- at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
          11241025- at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
          11241025- at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
          11241025- at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
          11241025- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
          11241025- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
          11241025- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
          11241025- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
          11241025- at java.lang.Thread.run(Thread.java:745)

          Show
          zhaoyunjiong yunjiong zhao added a comment - The previous description is not right. The stuck thread happened at org.apache.hadoop.net.unix.DomainSocket.writeArray0 as below shows. $ grep -B2 -A10 DomainSocket.writeArray 1124102* 11241021-"DataXceiver for client unix:/var/run/hadoop-hdfs/dn Waiting for operation #1 " daemon prio=10 tid=0x00007f7de034c800 nid=0x7b7 runnable [0x00007f7db06c5000] 11241021- java.lang.Thread.State: RUNNABLE 11241021: at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) 11241021- at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) 11241021- at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) 11241021- at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) 11241021- at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) 11241021- at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) 11241021- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) 11241021- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) 11241021- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) 11241021- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) 11241021- at java.lang.Thread.run(Thread.java:745) – -- 11241023-"DataXceiver for client unix:/var/run/hadoop-hdfs/dn Waiting for operation #1 " daemon prio=10 tid=0x00007f7de034c800 nid=0x7b7 runnable [0x00007f7db06c5000] 11241023- java.lang.Thread.State: RUNNABLE 11241023: at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) 11241023- at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) 11241023- at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) 11241023- at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) 11241023- at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) 11241023- at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) 11241023- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) 11241023- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) 11241023- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) 11241023- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) 11241023- at java.lang.Thread.run(Thread.java:745) – -- 11241025-"DataXceiver for client unix:/var/run/hadoop-hdfs/dn Waiting for operation #1 " daemon prio=10 tid=0x00007f7de034c800 nid=0x7b7 runnable [0x00007f7db06c5000] 11241025- java.lang.Thread.State: RUNNABLE 11241025: at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) 11241025- at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) 11241025- at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) 11241025- at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) 11241025- at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) 11241025- at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) 11241025- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) 11241025- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) 11241025- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) 11241025- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) 11241025- at java.lang.Thread.run(Thread.java:745)
          Hide
          zhaoyunjiong yunjiong zhao added a comment -

          The problem here is in our machine we can only send 299 bytes to domain socket.
          When it try to send the 300 byte, it will block, and the DomainSocketWatcher.add(DomainSocket sock, Handler handler) have the lock, so watcherThread.run can't get the lock and clear the buffer, it's a live lock.

          I'm not sure which configuration controls the bufferSize of 299 for now.
          Now I suspect net.core.netdev_budget, which is 300 at our machines.
          I'll upload a patch to control the send bytes to prevent live lock later.

          By the way, should I move this to HADOOP COMMON project?

          Show
          zhaoyunjiong yunjiong zhao added a comment - The problem here is in our machine we can only send 299 bytes to domain socket. When it try to send the 300 byte, it will block, and the DomainSocketWatcher.add(DomainSocket sock, Handler handler) have the lock, so watcherThread.run can't get the lock and clear the buffer, it's a live lock. I'm not sure which configuration controls the bufferSize of 299 for now. Now I suspect net.core.netdev_budget, which is 300 at our machines. I'll upload a patch to control the send bytes to prevent live lock later. By the way, should I move this to HADOOP COMMON project?
          Hide
          zhaoyunjiong yunjiong zhao added a comment -

          To prevent the live lock, this patch will only send 1 byte when have multiple kick() calls.
          No need add new test case.

          Show
          zhaoyunjiong yunjiong zhao added a comment - To prevent the live lock, this patch will only send 1 byte when have multiple kick() calls. No need add new test case.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12683536/HADOOP-11333.patch
          against trunk revision 61a2510.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5119//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5119//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683536/HADOOP-11333.patch against trunk revision 61a2510. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5119//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5119//console This message is automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks, yunjiong zhao... I understand the problem now.

          Yes, this code is in hadoop-common so we should have the JIRA there. Thanks for moving it.

          The patch looks good to me. If I understand correctly, the write should not block when the pipe is at less than its pipe capacity. This patch only relies on a pipe capacity of 1 byte, which is well below the minimum POSIX specifies.

          Just two comments:

          • can you move the kicked = false to NotificationHandler#handle? This is a non-static inner class, so it should have access to this variable. I think it's more appropriate to put this there, since that is the function which is handling the kick.
          • let's add a JavaDoc comment to the declaration of boolean kicked. Maybe something like:

            True if we have written a byte to the notification socket. We should not write anything else to the socket until the notification handler has had a chance to run. Otherwise, our thread might block, causing deadlock. See HADOOP-11333 for details.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks, yunjiong zhao ... I understand the problem now. Yes, this code is in hadoop-common so we should have the JIRA there. Thanks for moving it. The patch looks good to me. If I understand correctly, the write should not block when the pipe is at less than its pipe capacity. This patch only relies on a pipe capacity of 1 byte, which is well below the minimum POSIX specifies. Just two comments: can you move the kicked = false to NotificationHandler#handle ? This is a non-static inner class, so it should have access to this variable. I think it's more appropriate to put this there, since that is the function which is handling the kick. let's add a JavaDoc comment to the declaration of boolean kicked . Maybe something like: True if we have written a byte to the notification socket. We should not write anything else to the socket until the notification handler has had a chance to run. Otherwise, our thread might block, causing deadlock. See HADOOP-11333 for details.
          Hide
          zhaoyunjiong yunjiong zhao added a comment -

          Thanks Colin Patrick McCabe for your time.
          Update patch as recommended.

          Show
          zhaoyunjiong yunjiong zhao added a comment - Thanks Colin Patrick McCabe for your time. Update patch as recommended.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12683979/HADOOP-11333-1.patch
          against trunk revision c1f2bb2.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.ha.TestZKFailoverControllerStress

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5127//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5127//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683979/HADOOP-11333-1.patch against trunk revision c1f2bb2. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverControllerStress +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5127//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5127//console This message is automatically generated.
          Hide
          zhaoyunjiong yunjiong zhao added a comment -

          By the way, for a workaround, increase net.core.wmem_default can increase the number of message send without block.

          Show
          zhaoyunjiong yunjiong zhao added a comment - By the way, for a workaround, increase net.core.wmem_default can increase the number of message send without block.
          Hide
          cmccabe Colin P. McCabe added a comment -

          +1. Thanks, yunjiong zhao.

          Show
          cmccabe Colin P. McCabe added a comment - +1. Thanks, yunjiong zhao .
          Hide
          cmccabe Colin P. McCabe added a comment -

          Committed to 2.7.

          Note that this bug is likely to show up only on OSes where the pipe buffer size is small. So it probably won't show up on Linux.

          Show
          cmccabe Colin P. McCabe added a comment - Committed to 2.7. Note that this bug is likely to show up only on OSes where the pipe buffer size is small. So it probably won't show up on Linux.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #6617 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6617/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6617 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6617/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #23 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/23/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #23 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/23/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Hadoop-Hdfs-trunk #1951 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1951/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          Show
          hudson Hudson added a comment - ABORTED: Integrated in Hadoop-Hdfs-trunk #1951 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1951/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #761 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/761/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #761 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/761/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #23 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/23/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #23 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/23/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1975 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1975/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1975 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1975/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #23 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/23/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #23 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/23/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          vinayrpet Vinayakumar B added a comment -

          Cherry-picked to 2.6.1

          Show
          vinayrpet Vinayakumar B added a comment - Cherry-picked to 2.6.1
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8303 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8303/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8303 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8303/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526) hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #287 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/287/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #287 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/287/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526) hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526) hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526) hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526) hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526) hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/)
          HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/ ) HADOOP-11333 . Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong via cmccabe) (vinayakumarb: rev 05ed69058f22ebeccc58faf0be491c269e950526) hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          This wasn't originally in 2.6.1, must have been committed to 2.6, which was already 2.6.2. I just committed this to 2.6.1 .

          Ran compilation before the push.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - This wasn't originally in 2.6.1, must have been committed to 2.6, which was already 2.6.2. I just committed this to 2.6.1 . Ran compilation before the push.

            People

            • Assignee:
              zhaoyunjiong yunjiong zhao
              Reporter:
              zhaoyunjiong yunjiong zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development