Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11604

Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread.

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Our product cluster hit the Xceiver limit even w/ HADOOP-10404 & HADOOP-11333, i found it was caused by DomainSocketWatcher.watcherThread gone. Attached is a possible fix, please review, thanks

      1. HADOOP-11604-002.txt
        2 kB
        Liang Xie
      2. HADOOP-11604-001.txt
        1 kB
        Liang Xie
      3. HADOOP-11604.004.patch
        9 kB
        Chris Nauroth
      4. HADOOP-11604.003.patch
        4 kB
        Chris Nauroth

        Issue Links

          Activity

          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Pulled this into 2.6.1 after Akira Ajisaka verified that the patch applies cleanly. Ran compilation and TestDomainSocketWatcher before the push.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Pulled this into 2.6.1 after Akira Ajisaka verified that the patch applies cleanly. Ran compilation and TestDomainSocketWatcher before the push.
          Hide
          cmccabe Colin P. McCabe added a comment -

          thanks, Chris Nauroth.

          Show
          cmccabe Colin P. McCabe added a comment - thanks, Chris Nauroth .
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2062 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2062/)
          HADOOP-11604. Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2062 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2062/ ) HADOOP-11604 . Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #112 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/112/)
          HADOOP-11604. Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #112 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/112/ ) HADOOP-11604 . Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #102 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/102/)
          HADOOP-11604. Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #102 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/102/ ) HADOOP-11604 . Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2043 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2043/)
          HADOOP-11604. Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5)

          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2043 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2043/ ) HADOOP-11604 . Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #845 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/845/)
          HADOOP-11604. Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5)

          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #845 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/845/ ) HADOOP-11604 . Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #111 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/111/)
          HADOOP-11604. Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #111 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/111/ ) HADOOP-11604 . Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7168 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7168/)
          HADOOP-11604. Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7168 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7168/ ) HADOOP-11604 . Prevent ConcurrentModificationException while closing domain sockets during shutdown of DomainSocketWatcher thread. Contributed by Chris Nauroth. (cnauroth: rev 3c5ff0759c4f4e10c97c6d9036add00edb8be2b5) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
          Hide
          cnauroth Chris Nauroth added a comment -

          I have committed this to trunk and branch-2. Liang, thank you for the bug report. Colin, thanks again for the code review.

          Show
          cnauroth Chris Nauroth added a comment - I have committed this to trunk and branch-2. Liang, thank you for the bug report. Colin, thanks again for the code review.
          Hide
          cnauroth Chris Nauroth added a comment -

          Thanks for the review, Colin. I'm updating the title to state more accurately that this fixes one particular bug, perhaps different from the original reported issue's root cause.

          Show
          cnauroth Chris Nauroth added a comment - Thanks for the review, Colin. I'm updating the title to state more accurately that this fixes one particular bug, perhaps different from the original reported issue's root cause.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks, guys. I am +1 on Chris Nauroth's patch 004. I think it is a good improvement.

          I agree that it does not fix the root cause, here, though. We still don't know why the thread exited. However, this is a good step forward that will hopefully allow us to get more information later if necessary, as Chris commented.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks, guys. I am +1 on Chris Nauroth 's patch 004. I think it is a good improvement. I agree that it does not fix the root cause, here, though. We still don't know why the thread exited. However, this is a good step forward that will hopefully allow us to get more information later if necessary, as Chris commented.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12699918/HADOOP-11604.004.patch
          against trunk revision ce5bf92.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5749//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5749//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699918/HADOOP-11604.004.patch against trunk revision ce5bf92. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5749//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5749//console This message is automatically generated.
          Hide
          cnauroth Chris Nauroth added a comment -

          Liang Xie, Happy Spring Festival to you too!

          I'm attaching patch v004. This adds a test, which was a bit tricky to write. It was pretty easy to force the ConcurrentModificationException. I've added testCloseSocketOnWatcherClose to do this. However, the exception gets thrown on a separate thread from JUnit's execution, so it wouldn't actually cause the test to fail. I've modified the test suite so that for every test, we track whether or not the thread terminated with an unexpected exception. The new test fails without my changes in DomainSocketWatcher, and then it passes after I apply the fix.

          It occurs to me that we still probably don't know the real root cause of what happened in Liang's cluster. Why did the thread exit prematurely? This ConcurrentModificationException thrown from the finally block would have masked any exception thrown from the try body. Probably the best we can do at this point is to get in this fix, and then watch for additional reports of the problem afterwards.

          Show
          cnauroth Chris Nauroth added a comment - Liang Xie , Happy Spring Festival to you too! I'm attaching patch v004. This adds a test, which was a bit tricky to write. It was pretty easy to force the ConcurrentModificationException . I've added testCloseSocketOnWatcherClose to do this. However, the exception gets thrown on a separate thread from JUnit's execution, so it wouldn't actually cause the test to fail. I've modified the test suite so that for every test, we track whether or not the thread terminated with an unexpected exception. The new test fails without my changes in DomainSocketWatcher , and then it passes after I apply the fix. It occurs to me that we still probably don't know the real root cause of what happened in Liang's cluster. Why did the thread exit prematurely? This ConcurrentModificationException thrown from the finally block would have masked any exception thrown from the try body. Probably the best we can do at this point is to get in this fix, and then watch for additional reports of the problem afterwards.
          Hide
          xieliang007 Liang Xie added a comment -

          Chris Nauroth, np i am happy to see this issue is clear and will be resolve soon and feel free to assign it to you. i am taking a vacation and just skim through the latest patch and lgtm so far. All guys, Happy Spring Festival!

          Show
          xieliang007 Liang Xie added a comment - Chris Nauroth , np i am happy to see this issue is clear and will be resolve soon and feel free to assign it to you. i am taking a vacation and just skim through the latest patch and lgtm so far. All guys, Happy Spring Festival!
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12699717/HADOOP-11604.003.patch
          against trunk revision f0f2992.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5744//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5744//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699717/HADOOP-11604.003.patch against trunk revision f0f2992. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5744//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5744//console This message is automatically generated.
          Hide
          cnauroth Chris Nauroth added a comment -

          Liang Xie, thanks for looking into the .out file and sharing the stack trace. This makes sense now.

          I'm attaching a patch that avoids mutating the TreeMap during the iteration in the finally block. It's not important to remove as we iterate, because after the loop, we clear the whole map anyway and let it drop out of scope. I don't have a unit test for this. I still need to look into whether or not that's feasible. My intuition is that it won't be possible to repro the problem reliably in a unit test, because throwing ConcurrentModificationException is a best-effort check, not a guarantee.

          For the logging problem, I decided to go with the approach of setting an uncaught exception handler. The run method is already very deeply nested, and I think wrapping the whole thing in another try-catch for logging Throwable would hurt readability.

          Liang, I hope I'm not intruding by posting a patch on an issue assigned to you. I basically had to write this patch though while researching, so I'd rather share it than toss the work away.

          Show
          cnauroth Chris Nauroth added a comment - Liang Xie , thanks for looking into the .out file and sharing the stack trace. This makes sense now. I'm attaching a patch that avoids mutating the TreeMap during the iteration in the finally block. It's not important to remove as we iterate, because after the loop, we clear the whole map anyway and let it drop out of scope. I don't have a unit test for this. I still need to look into whether or not that's feasible. My intuition is that it won't be possible to repro the problem reliably in a unit test, because throwing ConcurrentModificationException is a best-effort check, not a guarantee. For the logging problem, I decided to go with the approach of setting an uncaught exception handler. The run method is already very deeply nested, and I think wrapping the whole thing in another try-catch for logging Throwable would hurt readability. Liang, I hope I'm not intruding by posting a patch on an issue assigned to you. I basically had to write this patch though while researching, so I'd rather share it than toss the work away.
          Hide
          xieliang007 Liang Xie added a comment -

          Thanks for all the valuable comments. After checking the out file, i saw the ConcurrentModificationException be thrown at inside the finally block:

                  for (Entry entry : entries.values()) {      <<<< HERE
                    sendCallback("close", entries, fdSet, entry.getDomainSocket().fd);
                  }
                  entries.clear();
          

          the log is sth like:

          Exception in thread "Thread-25" java.util.ConcurrentModificationException
                  at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
                  at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
                  at org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:484)
                  at java.lang.Thread.run(Thread.java:662)
          

          so the root cause in our case should be the non thread-safe pattern: foreach

          {treemap.remove}

          .

          Show
          xieliang007 Liang Xie added a comment - Thanks for all the valuable comments. After checking the out file, i saw the ConcurrentModificationException be thrown at inside the finally block: for (Entry entry : entries.values()) { <<<< HERE sendCallback( "close" , entries, fdSet, entry.getDomainSocket().fd); } entries.clear(); the log is sth like: Exception in thread " Thread -25" java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145) at org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:484) at java.lang. Thread .run( Thread .java:662) so the root cause in our case should be the non thread-safe pattern: foreach {treemap.remove} .
          Hide
          cmccabe Colin P. McCabe added a comment -

          Catching Throwable and logging it would be nice.

          Show
          cmccabe Colin P. McCabe added a comment - Catching Throwable and logging it would be nice.
          Hide
          cnauroth Chris Nauroth added a comment -

          I may have also seen this thread exit prematurely, even after the recent fixes. I don't have complete information though, so I can't say for sure.

          One potential problem is that the thread does not explicitly handle logging for unchecked exceptions, thrown either from the try body or the finally body. Without an uncaught exception handler, the default behavior would be to log the stack trace to the console, which is likely getting redirected to a .out file instead of the normal DataNode daemon log. An operator might not think to check the .out log. Liang, do you still have a .out file from this incident? If not, then a good first step might be to patch the logging so that we either set an uncaught exception handler to do the logging, or just catch and log Throwable explicitly.

          I agree that we need to find root cause before attempting a fix.

          Show
          cnauroth Chris Nauroth added a comment - I may have also seen this thread exit prematurely, even after the recent fixes. I don't have complete information though, so I can't say for sure. One potential problem is that the thread does not explicitly handle logging for unchecked exceptions, thrown either from the try body or the finally body. Without an uncaught exception handler, the default behavior would be to log the stack trace to the console, which is likely getting redirected to a .out file instead of the normal DataNode daemon log. An operator might not think to check the .out log. Liang, do you still have a .out file from this incident? If not, then a good first step might be to patch the logging so that we either set an uncaught exception handler to do the logging, or just catch and log Throwable explicitly. I agree that we need to find root cause before attempting a fix.
          Hide
          cmccabe Colin P. McCabe added a comment -

          This isn't quite right. If the DSW thread terminated unexpectedly, sockets will be leaked permanently.

          We should figure out why the DSW thread is terminating and fix that. Do you have any information about this?

          Show
          cmccabe Colin P. McCabe added a comment - This isn't quite right. If the DSW thread terminated unexpectedly, sockets will be leaked permanently. We should figure out why the DSW thread is terminating and fix that. Do you have any information about this?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12699264/HADOOP-11604-002.txt
          against trunk revision 2f0f756.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.net.unix.TestDomainSocketWatcher

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5722//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5722//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699264/HADOOP-11604-002.txt against trunk revision 2f0f756. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.net.unix.TestDomainSocketWatcher Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5722//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5722//console This message is automatically generated.
          Hide
          xieliang007 Liang Xie added a comment -

          Thanks for review, uploaded a new patch to address comment.
          xceiver reached the upper limit because no thread to signal processedCond after the watcherThread die. then all xceiver threads hung always at processedCond.await

          Show
          xieliang007 Liang Xie added a comment - Thanks for review, uploaded a new patch to address comment. xceiver reached the upper limit because no thread to signal processedCond after the watcherThread die. then all xceiver threads hung always at processedCond.await
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Liang Xie thank you for taking this JIRA.

                    Thread.sleep(100);
          

          There is a unhandled exception here. Could you fix it?

          Show
          ozawa Tsuyoshi Ozawa added a comment - Liang Xie thank you for taking this JIRA. Thread .sleep(100); There is a unhandled exception here. Could you fix it?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12699082/HADOOP-11604-001.txt
          against trunk revision ab0b958.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javac. The patch appears to cause the build to fail.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5713//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699082/HADOOP-11604-001.txt against trunk revision ab0b958. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5713//console This message is automatically generated.

            People

            • Assignee:
              cnauroth Chris Nauroth
              Reporter:
              xieliang007 Liang Xie
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development