Hadoop Common
  1. Hadoop Common
  2. HADOOP-6609

Deadlock in DFSClient#getBlockLocations even with the security disabled

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Here is the stack trace:
      "IPC Client (47) connection to XX" daemon
      prio=10 tid=0x00002aaae0369c00 nid=0x655b waiting for monitor entry [0x000000004181f000..0x000000004181fb80]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at org.apache.hadoop.io.UTF8.readChars(UTF8.java:210)

      • waiting to lock <0x00002aaab3eaee50> (a org.apache.hadoop.io.DataOutputBuffer)
        at org.apache.hadoop.io.UTF8.readString(UTF8.java:203)
        at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
        at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:638)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:573)

      "IPC Client (47) connection to /0.0.0.0:50030 from job_201002262308_0007"
      daemon prio=10 tid=0x00002aaae0272800 nid=0x6556 waiting for monitor entry [0x000000004131a000..0x000000004131ad00]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at org.apache.hadoop.io.UTF8.readChars(UTF8.java:210)

      • waiting to lock <0x00002aaab3eaee50> (a org.apache.hadoop.io.DataOutputBuffer)
        at org.apache.hadoop.io.UTF8.readString(UTF8.java:203)
        at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:179)
        at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:638)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:573)

      "main" prio=10 tid=0x0000000046c17800 nid=0x6544 in Object.wait() [0x0000000040207000..0x0000000040209ec0]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00002aaacee6bc38> (a org.apache.hadoop.ipc.Client$Call)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.hadoop.ipc.Client.call(Client.java:854) - locked <0x00002aaacee6bc38> (a org.apache.hadoop.ipc.Client$Call)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:223)
        at $Proxy2.getBlockLocations(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy2.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:333)
        at org.apache.hadoop.hdfs.DFSClient.access$2(DFSClient.java:330)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockAt(DFSClient.java:1606)
      • locked <0x00002aaacecb8258> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1704)
      • locked <0x00002aaacecb8258> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1856)
      • locked <0x00002aaacecb8258> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
        at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
        at org.apache.hadoop.io.UTF8.readChars(UTF8.java:211)
      • locked <0x00002aaab3eaee50> (a org.apache.hadoop.io.DataOutputBuffer)
        at org.apache.hadoop.io.UTF8.readString(UTF8.java:203)
        at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:90)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:1)
        at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:341)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:357)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:317)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:211)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:700)
        at org.apache.hadoop.mapred.Child.main(Child.java:205)
      1. c-6609.patch
        3 kB
        Owen O'Malley
      2. c-6609.patch
        3 kB
        Owen O'Malley

        Activity

        Hide
        Hairong Kuang added a comment -

        The main thread holds the lock to UTF8#OBUF, which is a static field, and waits for the reply. The RPC client needs to acquire the lock to UTF8#OBUF in order to receive the response from the NameNode server. Therefore, deadlock occurs.

        My first question is why use UTF8, which is a deprecated class in Hadoop? If Text is used instead, we won't get into this deadlock.

        Show
        Hairong Kuang added a comment - The main thread holds the lock to UTF8#OBUF, which is a static field, and waits for the reply. The RPC client needs to acquire the lock to UTF8#OBUF in order to receive the response from the NameNode server. Therefore, deadlock occurs. My first question is why use UTF8, which is a deprecated class in Hadoop? If Text is used instead, we won't get into this deadlock.
        Hide
        Konstantin Boudnik added a comment -

        This problem happens on vanilla cluster with the security off, However, the executed code belongs to the secured Hadoop. Thus, I'm changing the title of the JIRA.

        Show
        Konstantin Boudnik added a comment - This problem happens on vanilla cluster with the security off, However, the executed code belongs to the secured Hadoop. Thus, I'm changing the title of the JIRA.
        Hide
        Owen O'Malley added a comment -

        I propose that we make the OBUF in UTF8 a thread local variable to remove the locking on it.

        Show
        Owen O'Malley added a comment - I propose that we make the OBUF in UTF8 a thread local variable to remove the locking on it.
        Hide
        Owen O'Malley added a comment -

        This file changes the locking structure by replacing the static OBUF field in UTF8 with a thread local.

        Show
        Owen O'Malley added a comment - This file changes the locking structure by replacing the static OBUF field in UTF8 with a thread local.
        Hide
        Owen O'Malley added a comment -

        This patch is without the prefix.

        Show
        Owen O'Malley added a comment - This patch is without the prefix.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12437775/c-6609.patch
        against trunk revision 918624.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/404/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12437775/c-6609.patch against trunk revision 918624. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/404/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        +1

        Show
        Devaraj Das added a comment - +1
        Hide
        Owen O'Malley added a comment -

        This is the patch for trunk.

        Show
        Owen O'Malley added a comment - This is the patch for trunk.
        Hide
        Konstantin Boudnik added a comment -

        I have ran the job which used to timeout because of the deadlock a few times and it is running Ok now. All the data is being written properly and correctly.

        Thanks for the fix, Owen.

        +1 on the patch.

        Show
        Konstantin Boudnik added a comment - I have ran the job which used to timeout because of the deadlock a few times and it is running Ok now. All the data is being written properly and correctly. Thanks for the fix, Owen. +1 on the patch.
        Hide
        Konstantin Boudnik added a comment -

        And BTW: I have ran the full test suite to verify the patch - just one specific cluster test.

        Show
        Konstantin Boudnik added a comment - And BTW: I have ran the full test suite to verify the patch - just one specific cluster test.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12437820/c-6609.patch
        against trunk revision 918624.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/407/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/407/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/407/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/407/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12437820/c-6609.patch against trunk revision 918624. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/407/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/407/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/407/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/407/console This message is automatically generated.
        Hide
        Owen O'Malley added a comment -

        Changing synchronization is impossible to write a unit test for. smile

        I just committed this.

        Show
        Owen O'Malley added a comment - Changing synchronization is impossible to write a unit test for. smile I just committed this.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #195 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/195/)
        . Fixed deadlock in RPC by replacing shared static
        DataOutputBuffer in the UTF8 class with a thread local variable. (omalley)

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #195 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/195/ ) . Fixed deadlock in RPC by replacing shared static DataOutputBuffer in the UTF8 class with a thread local variable. (omalley)
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #266 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/266/)
        . Fixed deadlock in RPC by replacing shared static
        DataOutputBuffer in the UTF8 class with a thread local variable. (omalley)

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #266 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/266/ ) . Fixed deadlock in RPC by replacing shared static DataOutputBuffer in the UTF8 class with a thread local variable. (omalley)

          People

          • Assignee:
            Owen O'Malley
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development