Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
We have a jstack collection that spans 13 minutes. One frame per ~1.5 minutes. And for each of the frame, I observed the following:
Found one Java-level deadlock: ============================= "IPC Parameter Sending Thread #294": waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object), which is held by UNKNOWN_owner_addr=0x00007f68332e2800 Java stack information for the threads listed above: =================================================== "IPC Parameter Sending Thread #294": at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268) - waiting to lock <0x0000000621745390> (a java.lang.Object) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461) - locked <0x0000000621745380> (a java.lang.Object) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) - locked <0x0000000621749850> (a java.io.BufferedOutputStream) at java.io.DataOutputStream.flush(DataOutputStream.java:123) at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072) - locked <0x000000062174b878> (a java.io.DataOutputStream) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Found one Java-level deadlock: ============================= "IPC Client (297602875) connection to x.y.z.p:8020 from impala": waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a java.lang.Object), which is held by UNKNOWN_owner_addr=0x00007f68332e2800 Java stack information for the threads listed above: =================================================== "IPC Client (297602875) connection to x.y.z.p:8020 from impala": at sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279) - waiting to lock <0x0000000621745390> (a java.lang.Object) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390) - locked <0x0000000621745370> (a java.lang.Object) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.FilterInputStream.read(FilterInputStream.java:133) at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) - locked <0x00000006217476f0> (a java.io.BufferedInputStream) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006) Found 2 deadlocks.
This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.
The code appears to match https://github.com/tuxjdk/jdk8u/blob/master/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java.
The first thread is blocked at:
[The second thread is blocked at:
https://github.com/tuxjdk/jdk8u/blob/master/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java#L279
There are two issues here:
- There seems to be a real deadlock because the stacks remain the same even if the first an last jstack frames captured is 13 minutes apart.
- Java deadlock report seems to be problematic, two threads that have deadlock should not be blocked on the same lock, but they appear to be in this case: the same SocketChannelImpl's stateLock.
I found a relevant jdk jira https://bugs.openjdk.java.net/browse/JDK-8007476, it explains where two deadlocks are reported and they are really for the same deadlock.
I don't see a similar report about this issue in jdk jira database, and I'm thinking about filing a jdk jira for that, but would like to throw some discussion here before that.
Issue#1 is important, because the client is hanging, which indicate a real problem; however, without a correct report before issue#2 is fixed, it's not clear how the deadlock really looks like.
Thanks.
Attachments
Attachments
Issue Links
- is related to
-
IMPALA-7482 Deadlock with unknown lock holder in JVM in java.security.Provider.getService()
- Resolved
- relates to
-
HADOOP-15530 RPC could stuck at senderFuture.get()
- Open