Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15359

IPC client hang in kerberized cluster due to JDK deadlock

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.6.0, 2.8.0, 3.0.0
    • None
    • ipc
    • None

    Description

      In a recent internal testing, we have found a DFS client hang. Further inspecting jstack shows the following:

      "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 daemon prio=5 os_prio=0 tid=0x00007f6bb306c000 nid=0x1c76e waiting for monitor entry [0x00007f6bc2bd6000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at java.security.Provider.getService(Provider.java:1035)
              - waiting to lock <0x0000000080277040> (a sun.security.provider.Sun)
              at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
              at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
              at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
              at javax.crypto.Cipher.getInstance(Cipher.java:513)
              at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
              at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
              at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
              at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
              at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
              at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
              at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
              at sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
              at sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
              at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
              at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
              at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
              at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
              at com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
              at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
              at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
              - locked <0x0000000083444878> (a java.nio.HeapByteBuffer)
              at java.io.FilterInputStream.read(FilterInputStream.java:133)
              at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
              - locked <0x00000000834448c0> (a java.io.BufferedInputStream)
              at java.io.DataInputStream.readInt(DataInputStream.java:387)
              at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
              at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
      

      and at the end of jstack:

      Found one Java-level deadlock:
      =============================
      "IPC Parameter Sending Thread #29":
        waiting to lock monitor 0x0000000017ff49f8 (object 0x0000000080277040, a sun.security.provider.Sun),
        which is held by UNKNOWN_owner_addr=0x0000000050607000
      
      Java stack information for the threads listed above:
      ===================================================
      "IPC Parameter Sending Thread #29":
              at java.security.Provider.getService(Provider.java:1035)
              - waiting to lock <0x0000000080277040> (a sun.security.provider.Sun)
              at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
              at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
              at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
              at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
              - locked <0x00000000834386b8> (a java.lang.Object)
              at javax.crypto.SecretKeyFactory.<init>(SecretKeyFactory.java:121)
              at javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
              at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
              at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
              at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
              at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
              at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
              at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
              at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
              at sun.security.jgss.krb5.MessageToken.genSignAndSeqNumber(MessageToken.java:315)
              at sun.security.jgss.krb5.WrapToken.<init>(WrapToken.java:422)
              at sun.security.jgss.krb5.Krb5Context.wrap(Krb5Context.java:922)
              at sun.security.jgss.GSSContextImpl.wrap(GSSContextImpl.java:385)
              at com.sun.security.sasl.gsskerb.GssKrb5Base.wrap(GssKrb5Base.java:103)
              at org.apache.hadoop.security.SaslRpcClient$WrappedOutputStream.write(SaslRpcClient.java:636)
              at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
              at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
              - locked <0x00000000834389a8> (a java.io.BufferedOutputStream)
              at java.io.DataOutputStream.flush(DataOutputStream.java:123)
              at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
              - locked <0x00000000834389c0> (a java.io.DataOutputStream)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

      After some research, the closest I found is https://bugs.openjdk.java.net/browse/JDK-7092821 .
      Filing this jira for discussions.

      JDK version used was 1.8.0_144.

      Attachments

        1. 1.jstack
          82 kB
          Xiao Chen
        2. 2.jstack
          82 kB
          Xiao Chen

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            xiaochen Xiao Chen

            Dates

              Created:
              Updated:

              Slack

                Issue deployment