Hadoop Common
  1. Hadoop Common
  2. HADOOP-8770

NN should not RPC to self to find trash defaults (causes deadlock)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.2-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: trash
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When transitioning a SBN to active, I ran into the following situation:

      • the TrashPolicy first gets loaded by an IPC Server Handler thread. The initialize function then tries to make an RPC to the same node to find out the defaults.
      • This is happening inside the NN write lock (since it's part of the active initialization). Hence, all of the other handler threads are already blocked waiting to get the NN lock.
      • Since no handler threads are free, the RPC blocks forever and the NN never enters active state.

      We need to have a general policy that the NN should never make RPCs to itself for any reason, due to potential for deadlocks like this.

      1. hdfs-3876.txt
        3 kB
        Eli Collins
      2. hdfs-3876.txt
        10 kB
        Eli Collins
      3. hdfs-3876.txt
        11 kB
        Eli Collins
      4. hdfs-3876.txt
        11 kB
        Eli Collins

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1188 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1188/)
          HADOOP-8770. NN should not RPC to self to find trash defaults. Contributed by Eli Collins (Revision 1381319)

          Result = ABORTED
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381319
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSTrash.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1188 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1188/ ) HADOOP-8770 . NN should not RPC to self to find trash defaults. Contributed by Eli Collins (Revision 1381319) Result = ABORTED eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381319 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSTrash.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1157 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1157/)
          HADOOP-8770. NN should not RPC to self to find trash defaults. Contributed by Eli Collins (Revision 1381319)

          Result = FAILURE
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381319
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSTrash.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1157 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1157/ ) HADOOP-8770 . NN should not RPC to self to find trash defaults. Contributed by Eli Collins (Revision 1381319) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381319 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSTrash.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2711 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2711/)
          HADOOP-8770. NN should not RPC to self to find trash defaults. Contributed by Eli Collins (Revision 1381319)

          Result = FAILURE
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381319
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSTrash.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2711 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2711/ ) HADOOP-8770 . NN should not RPC to self to find trash defaults. Contributed by Eli Collins (Revision 1381319) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381319 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSTrash.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2750 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2750/)
          HADOOP-8770. NN should not RPC to self to find trash defaults. Contributed by Eli Collins (Revision 1381319)

          Result = SUCCESS
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381319
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSTrash.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2750 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2750/ ) HADOOP-8770 . NN should not RPC to self to find trash defaults. Contributed by Eli Collins (Revision 1381319) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381319 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSTrash.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2687 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2687/)
          HADOOP-8770. NN should not RPC to self to find trash defaults. Contributed by Eli Collins (Revision 1381319)

          Result = SUCCESS
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381319
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSTrash.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2687 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2687/ ) HADOOP-8770 . NN should not RPC to self to find trash defaults. Contributed by Eli Collins (Revision 1381319) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381319 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestTrash.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestHDFSTrash.java
          Hide
          Eli Collins added a comment -

          Thanks for the review Todd. I've committed this and merged to branch-2.

          Show
          Eli Collins added a comment - Thanks for the review Todd. I've committed this and merged to branch-2.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543870/hdfs-3876.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3148//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3148//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543870/hdfs-3876.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3148//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3148//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          +1 pending Jenkins report

          Show
          Todd Lipcon added a comment - +1 pending Jenkins report
          Hide
          Eli Collins added a comment -

          #1 Done (and the test now just checks if the file was trashed since we can't get the mutated conf)
          #2 Yup, I actually filed HADOOP-8690 for this a month ago. Throws an IOE now.
          #3 Only if the server conf is set, and in that case the client conf is overridden, I fixed the test so the client conf is explicitly disabled if !clientTrash.

          Show
          Eli Collins added a comment - #1 Done (and the test now just checks if the file was trashed since we can't get the mutated conf) #2 Yup, I actually filed HADOOP-8690 for this a month ago. Throws an IOE now. #3 Only if the server conf is set, and in that case the client conf is overridden, I fixed the test so the client conf is explicitly disabled if !clientTrash.
          Hide
          Todd Lipcon added a comment -
          • When you clobber the trash interval in the configuration, you sohuld do it on a copy, rather than modifying the config that the user passed in.
          +      // If we can not determine that trash is enabled server side then
          +      // bail rather than potentially deleting a file when trash is enabled.
          +      System.err.println("Failed to determine server trash configuration: "
          +          + e.getMessage());
          +      return false;
          

          This doesn't seem to be what happens. See the TODO in Delete.java:

                // TODO: if the user wants the trash to be used but there is any
                // problem (ie. creating the trash dir, moving the item to be deleted,
                // etc), then the path will just be deleted because moveToTrash returns
                // false and it falls thru to fs.delete.  this doesn't seem right
          

          if you actually want it to bail, it should probably throw an exception – or return false but remove this comment, and separately address the TODO mentioned above.


          +    Configuration clientConf = new Configuration(serverConf);
          +    if (clientTrash) {
          +      clientConf.setLong(FS_TRASH_INTERVAL_KEY, 200);
          +    }
          

          Since you instantiate the client conf from serverConf, won't you end up with client trash enabled even if clientTrash is false?

          Show
          Todd Lipcon added a comment - When you clobber the trash interval in the configuration, you sohuld do it on a copy, rather than modifying the config that the user passed in. + // If we can not determine that trash is enabled server side then + // bail rather than potentially deleting a file when trash is enabled. + System .err.println( "Failed to determine server trash configuration: " + + e.getMessage()); + return false ; This doesn't seem to be what happens. See the TODO in Delete.java : // TODO: if the user wants the trash to be used but there is any // problem (ie. creating the trash dir, moving the item to be deleted, // etc), then the path will just be deleted because moveToTrash returns // false and it falls thru to fs.delete. this doesn't seem right if you actually want it to bail, it should probably throw an exception – or return false but remove this comment, and separately address the TODO mentioned above. + Configuration clientConf = new Configuration(serverConf); + if (clientTrash) { + clientConf.setLong(FS_TRASH_INTERVAL_KEY, 200); + } Since you instantiate the client conf from serverConf , won't you end up with client trash enabled even if clientTrash is false?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543455/hdfs-3876.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3138//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3138//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543455/hdfs-3876.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3138//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3138//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543455/hdfs-3876.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3137//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3137//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543455/hdfs-3876.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3137//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3137//console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          TestViewFsTrash failed because the test deletes "/" and we now get server defaults on the path (to get the trash configuration), which fails for viewfs for "/" because "/" is not associated with a file system and we fail due to a NotInMoutPointException.

          Updated patch, catch Exception rather than IOE when obtaining server defaults so we fail the delete when we fail to get server defaults (rather than potentially ignoring the server trash configuration for transient errors getting server defaults) and updated TestTrash to not fail the test when we fail due to obtaining server defaults (which is what should happen in the viewfs case).

          Show
          Eli Collins added a comment - TestViewFsTrash failed because the test deletes "/" and we now get server defaults on the path (to get the trash configuration), which fails for viewfs for "/" because "/" is not associated with a file system and we fail due to a NotInMoutPointException. Updated patch, catch Exception rather than IOE when obtaining server defaults so we fail the delete when we fail to get server defaults (rather than potentially ignoring the server trash configuration for transient errors getting server defaults) and updated TestTrash to not fail the test when we fail due to obtaining server defaults (which is what should happen in the viewfs case).
          Hide
          Eli Collins added a comment -

          The TestViewFsTrash failure is related, will upload a new patch.

          Show
          Eli Collins added a comment - The TestViewFsTrash failure is related, will upload a new patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543377/hdfs-3876.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.fs.viewfs.TestViewFsTrash
          org.apache.hadoop.ha.TestZKFailoverController
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
          org.apache.hadoop.hdfs.TestPersistBlocks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3136//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3136//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543377/hdfs-3876.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.viewfs.TestViewFsTrash org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.TestPersistBlocks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3136//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3136//console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          Filed HDFS-3881 to detect and disallow self-IPC.

          Filed HDFS-3882 to make the NN emptier not use the client to delete files. We could easily do this via a new hdfs-specific policy class (or just moving trash to the NN entirely per HADOOP-8598).

          Show
          Eli Collins added a comment - Filed HDFS-3881 to detect and disallow self-IPC. Filed HDFS-3882 to make the NN emptier not use the client to delete files. We could easily do this via a new hdfs-specific policy class (or just moving trash to the NN entirely per HADOOP-8598 ).
          Hide
          Eli Collins added a comment -

          Patch with test. Testing via the shell now rather than Trash directly since the handling is moved to the shell, and it's better to test at the shell level anyway.

          Show
          Eli Collins added a comment - Patch with test. Testing via the shell now rather than Trash directly since the handling is moved to the shell, and it's better to test at the shell level anyway.
          Hide
          Aaron T. Myers added a comment -

          Makes sense. I was just responding to the comment that "We need to have a general policy that the NN should never make RPCs to itself for any reason, due to potential for deadlocks like this."

          Show
          Aaron T. Myers added a comment - Makes sense. I was just responding to the comment that "We need to have a general policy that the NN should never make RPCs to itself for any reason, due to potential for deadlocks like this."
          Hide
          Todd Lipcon added a comment -

          Somewhat related: I believe that the trash emptier thread in the NN also makes RPCs to the NN to delete the appropriate files, in addition to configuring itself. Should we do something about that as well?

          Would be a nice improvement but it can't cause a deadlock, since it's not being done from an IPC handler thread. The issue here is that we have a handler thread making an IPC back to itself. So it's possible that, when you don't have enough threads, the IPC blocks forever because the client itself is holding the thread up.

          Show
          Todd Lipcon added a comment - Somewhat related: I believe that the trash emptier thread in the NN also makes RPCs to the NN to delete the appropriate files, in addition to configuring itself. Should we do something about that as well? Would be a nice improvement but it can't cause a deadlock, since it's not being done from an IPC handler thread. The issue here is that we have a handler thread making an IPC back to itself. So it's possible that, when you don't have enough threads, the IPC blocks forever because the client itself is holding the thread up.
          Hide
          Aaron T. Myers added a comment -

          Somewhat related: I believe that the trash emptier thread in the NN also makes RPCs to the NN to delete the appropriate files, in addition to configuring itself. Should we do something about that as well?

          Show
          Aaron T. Myers added a comment - Somewhat related: I believe that the trash emptier thread in the NN also makes RPCs to the NN to delete the appropriate files, in addition to configuring itself. Should we do something about that as well?
          Hide
          Eli Collins added a comment -

          I considered these approaches:

          1. Modify the current check to not use the the server defaults if called from the NN context. We could try to detect this or plumb a boolean through the various constructors. Both are a pain (because this code lives in common and there are a fair number of paths and it would be easy to miss a new one).

          2. In FsShell get server defaults at shell initialization time and clobber the client config with the server config if set. This is simple and has the benefit of only retrieving the server defaults once rather than per path item but because operations can span NNs and each NN may have a different trash config this option doesn't work.

          3. Approach #2 but wait to get the server config when we figure out the right NN for the path.

          I prefer the last one, here's a patch that implements it, I still need to update the tests to match.

          Show
          Eli Collins added a comment - I considered these approaches: 1. Modify the current check to not use the the server defaults if called from the NN context. We could try to detect this or plumb a boolean through the various constructors. Both are a pain (because this code lives in common and there are a fair number of paths and it would be easy to miss a new one). 2. In FsShell get server defaults at shell initialization time and clobber the client config with the server config if set. This is simple and has the benefit of only retrieving the server defaults once rather than per path item but because operations can span NNs and each NN may have a different trash config this option doesn't work. 3. Approach #2 but wait to get the server config when we figure out the right NN for the path. I prefer the last one, here's a patch that implements it, I still need to update the tests to match.
          Hide
          Eli Collins added a comment -

          I'll try to get a patch up tonight, if it's blocking you I can revert it.

          Show
          Eli Collins added a comment - I'll try to get a patch up tonight, if it's blocking you I can revert it.
          Hide
          Todd Lipcon added a comment -

          This was caused by HADOOP-8689. If we can't fix it quickly we should revert that patch.

          Show
          Todd Lipcon added a comment - This was caused by HADOOP-8689 . If we can't fix it quickly we should revert that patch.

            People

            • Assignee:
              Eli Collins
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development