Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.6.0
-
None
-
None
Description
In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log:
ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662)
The line 463 is following code snippet:
try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove("getAndClearReadableFds", entries, fdSet, fd); }
getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned.
The bad thing is that other threads then blocked in stack like this:
"DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1]" daemon prio=10 tid=0x00007f0c9c086d90 nid=0x8fc3 waiting on condition [0x00007f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007b0174808> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662)
IMO, we should exit the DN so that the users can know that something go wrong and fix it.
Attachments
Attachments
- HDFS-8429-001.patch
- 3 kB
- zhouyingchao
- HDFS-8429-002.patch
- 5 kB
- zhouyingchao
- HDFS-8429-003.patch
- 6 kB
- zhouyingchao
Activity
It's very surprising that you managed to run out of memory there. The number of readable file descriptors is unlikely to be more than a few thousand at most, and therefore you failed to allocate a < 4kb memory region. Either that, or you somehow got a very large number of readable file descriptors all at once for some reason. We could cache this memory to avoid doing repeated malloc/free cycles, but I am surprised that this is an issue. I suspect that even if this is fixed, you will have many other unrelated issues that will make the system unusable if memory is truly exhausted.
JNIEXPORT jobject JNICALL Java_org_apache_hadoop_net_unix_DomainSocketWatcher_00024FdSet_getAndClearReadableFds( JNIEnv *env, jobject obj) { ... done: free(carr); if (jthr) { (*env)->DeleteLocalRef(env, jarr); jarr = NULL; } return jarr; }
This should be calling (*env)->Throw(env, jthr)
Colin, thank you for the great comments. In this case, I think the bottom line is that the death of the watcher thread should not block other threads and the client side should be indicated to fall through to other ways as quick as possible.
I created a patch trying to resolve the blocking. Besides that, I also changed the native getAndClearReadableFds method to throw exception as Colin mentioned. Please feel free to post your thoughts and comments. Thank you.
Test following cases : TestDomainSocket,TestDomainSocketWatcher,TestParallelShortCircuitRead,TestFsDatasetCacheRevocation,TestFatasetCacheRevocation,TestScrLazyPersistFiles,TestParallelShortCircuitReadNoChecksum,TestDFSInputStream,TestBlockReaderFactory,TestParallelUnixDomainRead,TestParallelShortCircuitReadUnCached,TestBlockReaderLocalLegacy,TestPeerCache,TestShortCircuitCache,TestShortCircuitLocalRead,TestBlockReaderLocal,TestParallelShortCircuitLegacyRead,TestTracingShortCircuitLocalRead,TestEnhancedByteBufferAccess
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | pre-patch | 14m 39s | Pre-patch trunk compilation is healthy. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
-1 | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. |
+1 | javac | 7m 31s | There were no new javac warning messages. |
+1 | javadoc | 9m 42s | There were no new javadoc warning messages. |
+1 | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. |
-1 | checkstyle | 1m 5s | The applied patch generated 2 new checkstyle issues (total was 19, now 21). |
+1 | whitespace | 0m 0s | The patch has no lines that end in whitespace. |
+1 | install | 1m 34s | mvn install still works. |
+1 | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. |
+1 | findbugs | 1m 40s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. |
+1 | common tests | 22m 53s | Tests passed in hadoop-common. |
60m 2s |
Subsystem | Report/Notes |
---|---|
Patch URL | http://issues.apache.org/jira/secure/attachment/12734105/HDFS-8429-001.patch |
Optional Tests | javadoc javac unit findbugs checkstyle |
git revision | trunk / ce53c8e |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11061/artifact/patchprocess/diffcheckstylehadoop-common.txt |
hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11061/artifact/patchprocess/testrun_hadoop-common.txt |
Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11061/testReport/ |
Java | 1.7.0_55 |
uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11061/console |
This message was automatically generated.
Thanks. We shouldn't need to modify DomainSocketWatcher#add and DomainSocketWatcher#remove, since the finally block clears toAdd and toRemove prior to calling signalAll. The modifications to the finally block look reasonable. I think we are going to need a unit test for this as well. Perhaps simply send an interrupt to a thread in the unit test.
Yes, the modification of DomainSocketWatcher#add and DomainSocketWatcher#remove is not needed. I changed the patch accordingly and added a unit test case as suggested. The code of the test case is almost borrowed from the testStress() in the same file. Thank you.
Tested cases include TestParallelShortCircuitLegacyRead, TestParallelShortCircuitRead, TestParallelShortCircuitReadNoChecksum, TestParallelShortCircuitReadUnCached, TestShortCircuitCache, TestShortCircuitLocalRead, TestShortCircuitShm, TemporarySocketDirectory, TestDomainSocket, TestDomainSocketWatcher
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | pre-patch | 14m 48s | Pre-patch trunk compilation is healthy. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | tests included | 0m 0s | The patch appears to include 1 new or modified test files. |
+1 | javac | 7m 33s | There were no new javac warning messages. |
+1 | javadoc | 9m 33s | There were no new javadoc warning messages. |
+1 | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. |
-1 | checkstyle | 1m 7s | The applied patch generated 1 new checkstyle issues (total was 19, now 20). |
+1 | whitespace | 0m 1s | The patch has no lines that end in whitespace. |
+1 | install | 1m 34s | mvn install still works. |
+1 | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. |
+1 | findbugs | 1m 41s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. |
+1 | common tests | 23m 50s | Tests passed in hadoop-common. |
61m 6s |
Subsystem | Report/Notes |
---|---|
Patch URL | http://issues.apache.org/jira/secure/attachment/12734342/HDFS-8429-002.patch |
Optional Tests | javadoc javac unit findbugs checkstyle |
git revision | trunk / fb6b38d |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11075/artifact/patchprocess/diffcheckstylehadoop-common.txt |
hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11075/artifact/patchprocess/testrun_hadoop-common.txt |
Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11075/testReport/ |
Java | 1.7.0_55 |
uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11075/console |
This message was automatically generated.
One thing that I am concerned about is that we could double-close something in toAdd if the code fails in this block:
if (!(toAdd.isEmpty() && toRemove.isEmpty())) { // Handle pending additions (before pending removes). for (Iterator<Entry> iter = toAdd.iterator(); iter.hasNext(); ) { Entry entry = iter.next(); DomainSocket sock = entry.getDomainSocket(); Entry prevEntry = entries.put(sock.fd, entry); Preconditions.checkState(prevEntry == null, this + ": tried to watch a file descriptor that we " + "were already watching: " + sock); if (LOG.isTraceEnabled()) { LOG.trace(this + ": adding fd " + sock.fd); } fdSet.add(sock.fd); iter.remove(); }
to prevent that, let's move the iter.remove() in this block up to right after the Entry entry = iter.next();.
+1 once that's done
Tested cases include TestParallelShortCircuitLegacyRead, TestParallelShortCircuitRead, TestParallelShortCircuitReadNoChecksum, TestParallelShortCircuitReadUnCached, TestShortCircuitCache, TestShortCircuitLocalRead, TestShortCircuitShm, TemporarySocketDirectory, TestDomainSocket, TestDomainSocketWatcher
Colin, thank you for pointing out this issue. I've changed and uploaded the patch accordingly.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | pre-patch | 15m 4s | Pre-patch trunk compilation is healthy. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | tests included | 0m 0s | The patch appears to include 1 new or modified test files. |
+1 | javac | 7m 42s | There were no new javac warning messages. |
+1 | javadoc | 9m 43s | There were no new javadoc warning messages. |
+1 | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. |
-1 | checkstyle | 1m 6s | The applied patch generated 1 new checkstyle issues (total was 19, now 20). |
+1 | whitespace | 0m 0s | The patch has no lines that end in whitespace. |
+1 | install | 1m 36s | mvn install still works. |
+1 | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. |
+1 | findbugs | 1m 42s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. |
+1 | common tests | 22m 48s | Tests passed in hadoop-common. |
60m 41s |
Subsystem | Report/Notes |
---|---|
Patch URL | http://issues.apache.org/jira/secure/attachment/12735500/HDFS-8429-003.patch |
Optional Tests | javadoc javac unit findbugs checkstyle |
git revision | trunk / cdbd66b |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11136/artifact/patchprocess/diffcheckstylehadoop-common.txt |
hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11136/artifact/patchprocess/testrun_hadoop-common.txt |
Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11136/testReport/ |
Java | 1.7.0_55 |
uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11136/console |
This message was automatically generated.
FAILURE: Integrated in Hadoop-trunk-Commit #7919 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7919/)
HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread. (zhouyingchao via cmccabe) (cmccabe: rev 246cefa089156a50bf086b8b1e4d4324d66dc58c)
- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
- hadoop-common-project/hadoop-common/CHANGES.txt
- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
- hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #200 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/200/)
HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread. (zhouyingchao via cmccabe) (cmccabe: rev 246cefa089156a50bf086b8b1e4d4324d66dc58c)
- hadoop-common-project/hadoop-common/CHANGES.txt
- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
- hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #212 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/212/)
HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread. (zhouyingchao via cmccabe) (cmccabe: rev 246cefa089156a50bf086b8b1e4d4324d66dc58c)
- hadoop-common-project/hadoop-common/CHANGES.txt
- hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
SUCCESS: Integrated in Hadoop-Yarn-trunk #942 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/942/)
HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread. (zhouyingchao via cmccabe) (cmccabe: rev 246cefa089156a50bf086b8b1e4d4324d66dc58c)
- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
- hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
- hadoop-common-project/hadoop-common/CHANGES.txt
- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
FAILURE: Integrated in Hadoop-Hdfs-trunk #2140 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2140/)
HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread. (zhouyingchao via cmccabe) (cmccabe: rev 246cefa089156a50bf086b8b1e4d4324d66dc58c)
- hadoop-common-project/hadoop-common/CHANGES.txt
- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
- hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #210 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/210/)
HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread. (zhouyingchao via cmccabe) (cmccabe: rev 246cefa089156a50bf086b8b1e4d4324d66dc58c)
- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
- hadoop-common-project/hadoop-common/CHANGES.txt
- hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
cmccabe Should we stop DN in this condition?