Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0, 1.2.0
    • Fix Version/s: 1.1.1
    • Component/s: namenode
    • Labels:
      None

      Description

      Jitendra found out the following problem:
      1. Handler : Acquires namesystem lock waits on SafemodeInfo lock at SafeModeInfo.isOn()
      2. SafemodeMonitor : Calls SafeModeInfo.canLeave() which is synchronized so SafemodeInfo lock is acquired, but this method also causes following call sequence needEnter() -> getNumLiveDataNodes() -> getNumberOfDatanodes() -> getDatanodeListForReport() -> getDatanodeListForReport() . The getDatanodeListForReport is synchronized with FSNamesystem lock.

        Issue Links

          Activity

          Hide
          Brandon Li added a comment -

          One deadlock example is between SafeModeMonitor and blockreport.

          Thread 16142: (state = BLOCKED)
          - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanodeListForReport(org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType) @bci=0, line=4208 (Interpreted frame)
          - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNumberOfDatanodes(org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType) @bci=2, line=4202 (Interpreted frame)
          - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNumLiveDataNodes() @bci=4, line=4198 (Interpreted frame)
          - org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.needEnter() @bci=17, line=4886 (Interpreted frame)
          - org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.canLeave() @bci=38, line=4878 (Interpreted frame)
          - org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor.run() @bci=27, line=5074 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)
          
          
          Thread 16126: (state = BLOCKED)
          - org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.incrementSafeBlockCount(short) @bci=0, line=4938 (Interpreted frame)
          - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.incrementSafeBlockCount(int) @bci=14, line=5141 (Interpreted frame)
          - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addStoredBlock(org.apache.hadoop.hdfs.protocol.Block, org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor, org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor) @bci=1134, line=3749 (Interpreted frame)
          - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(org.apache.hadoop.hdfs.protocol.DatanodeID, org.apache.hadoop.hdfs.protocol.BlockListAsLongs) @bci=316, line=3548 (Interpreted frame)
          - org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration, long[]) @bci=70, line=978 (Interpreted frame)
          - sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=0 (Interpreted frame)
          - sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=87, line=39 (Interpreted frame)
          - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=25 (Interpreted frame)
          - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=161, line=597 (Interpreted frame)
          - org.apache.hadoop.ipc.RPC$Server.call(java.lang.Class, org.apache.hadoop.io.Writable, long) @bci=74, line=578 (Interpreted frame)
          - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=31, line=1388 (Interpreted frame)
          - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=1384 (Interpreted frame)
          - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame)
          - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
          - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1122 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler.run() @bci=205, line=1382 (Interpreted frame)
          
          Show
          Brandon Li added a comment - One deadlock example is between SafeModeMonitor and blockreport. Thread 16142: (state = BLOCKED) - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanodeListForReport(org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType) @bci=0, line=4208 (Interpreted frame) - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNumberOfDatanodes(org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType) @bci=2, line=4202 (Interpreted frame) - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNumLiveDataNodes() @bci=4, line=4198 (Interpreted frame) - org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.needEnter() @bci=17, line=4886 (Interpreted frame) - org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.canLeave() @bci=38, line=4878 (Interpreted frame) - org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor.run() @bci=27, line=5074 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame) Thread 16126: (state = BLOCKED) - org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.incrementSafeBlockCount(short) @bci=0, line=4938 (Interpreted frame) - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.incrementSafeBlockCount(int) @bci=14, line=5141 (Interpreted frame) - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addStoredBlock(org.apache.hadoop.hdfs.protocol.Block, org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor, org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor) @bci=1134, line=3749 (Interpreted frame) - org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(org.apache.hadoop.hdfs.protocol.DatanodeID, org.apache.hadoop.hdfs.protocol.BlockListAsLongs) @bci=316, line=3548 (Interpreted frame) - org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration, long[]) @bci=70, line=978 (Interpreted frame) - sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=0 (Interpreted frame) - sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=87, line=39 (Interpreted frame) - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=25 (Interpreted frame) - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=161, line=597 (Interpreted frame) - org.apache.hadoop.ipc.RPC$Server.call(java.lang.Class, org.apache.hadoop.io.Writable, long) @bci=74, line=578 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=31, line=1388 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=1384 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1122 (Interpreted frame) - org.apache.hadoop.ipc.Server$Handler.run() @bci=205, line=1382 (Interpreted frame)
          Hide
          Brandon Li added a comment -

          test-patch result:

          -1 overall.  
              +1 @author.  The patch does not contain any @author tags.
              -1 tests included.  The patch doesn't appear to include any new or modified tests.
                                  Please justify why no tests are needed for this patch.
              +1 javadoc.  The javadoc tool did not generate any warning messages.
              +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
              -1 findbugs.  The patch appears to introduce 195 new Findbugs (version 2.0.0) warnings.
          

          The patch doesn't introduce new findbugs warning.
          It's hard to write a unit test for it. The problem is not very reproducible neither.

          Show
          Brandon Li added a comment - test-patch result: -1 overall. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 195 new Findbugs (version 2.0.0) warnings. The patch doesn't introduce new findbugs warning. It's hard to write a unit test for it. The problem is not very reproducible neither.
          Hide
          Jitendra Nath Pandey added a comment -

          +1. looks good to me.
          It seems the problem doesn't exist in the trunk.

          Show
          Jitendra Nath Pandey added a comment - +1. looks good to me. It seems the problem doesn't exist in the trunk.
          Hide
          Jitendra Nath Pandey added a comment -

          Committed to branch-1. Thanks to Brandon.

          Show
          Jitendra Nath Pandey added a comment - Committed to branch-1. Thanks to Brandon.
          Hide
          Matt Foley added a comment -

          included in branch-1.1

          Show
          Matt Foley added a comment - included in branch-1.1
          Hide
          Matt Foley added a comment -

          Closed upon release of 1.1.1.

          Show
          Matt Foley added a comment - Closed upon release of 1.1.1.

            People

            • Assignee:
              Brandon Li
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development