Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3972

Trash emptier fails in secure HA cluster

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 2.0.1-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In a secure HA cluster, we're seeing the following issue on the NN when the trash emptier tries to run:

      WARN org.apache.hadoop.fs.TrashPolicyDefault: Trash can't list homes: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host \
      is: "xxxxx"; destination host is: "xxxx":8020; Sleeping.

      The issue seems to be that the trash emptier thread sends RPCs back to itself, but isn't wrapped in a doAs. Credit goes to Stephen Chu for discovering this.

      1. hdfs-3972.txt
        5 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        edit: updated Summary and Description to indicate this only happens when HA is enabled on the secure cluster

        Show
        Todd Lipcon added a comment - edit: updated Summary and Description to indicate this only happens when HA is enabled on the secure cluster
        Hide
        Todd Lipcon added a comment -

        The root cause of the issue is that, in HA, the TrashEmptier is started from the context of an RPC, rather than the context of the NN startup code. So, it was running as the administrator who called transitionToActive (remote non-keytab user) rather than the context of the NN login user.

        Attached patch fixes the issue. It doesn't include a new unit test, since the issue is only on secure clusters, which aren't automatable.

        I tested manually as follows:

        • configure a secure HA cluster
        • configure fs.trash.interval and fs.trash.checkpoint.interval = 1 in core-site.xml
        • started NN and transitioned to active
          • without the patch, I saw the error mentioned above. with the patch, no error
        • created and deleted a file, which got moved to trash
        • waited a minute, and saw that the trash checkpointer moved the file correctly
          • without the patch, I saw the error again. With the patch, it worked.
        Show
        Todd Lipcon added a comment - The root cause of the issue is that, in HA, the TrashEmptier is started from the context of an RPC, rather than the context of the NN startup code. So, it was running as the administrator who called transitionToActive (remote non-keytab user) rather than the context of the NN login user. Attached patch fixes the issue. It doesn't include a new unit test, since the issue is only on secure clusters, which aren't automatable. I tested manually as follows: configure a secure HA cluster configure fs.trash.interval and fs.trash.checkpoint.interval = 1 in core-site.xml started NN and transitioned to active without the patch, I saw the error mentioned above. with the patch, no error created and deleted a file, which got moved to trash waited a minute, and saw that the trash checkpointer moved the file correctly without the patch, I saw the error again. With the patch, it worked.
        Hide
        Eli Collins added a comment -

        +1 pending jenkins, looks great

        Show
        Eli Collins added a comment - +1 pending jenkins, looks great
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12546574/hdfs-3972.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.ha.TestZKFailoverController
        org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
        org.apache.hadoop.hdfs.web.TestWebHDFS
        org.apache.hadoop.hdfs.TestPersistBlocks

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3231//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3231//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546574/hdfs-3972.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.TestPersistBlocks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3231//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3231//console This message is automatically generated.
        Hide
        Eli Collins added a comment -

        Test failures are unrelated:

        HDFS-3811 - TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky
        HDFS-2434 - TestNameNodeMetrics.testCorruptBlock fails intermittently
        HDFS-3948 - TestWebHDFS#testNamenodeRestart is racy
        HADOOP-8591- TestZKFailoverController tests time out

        Show
        Eli Collins added a comment - Test failures are unrelated: HDFS-3811 - TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky HDFS-2434 - TestNameNodeMetrics.testCorruptBlock fails intermittently HDFS-3948 - TestWebHDFS#testNamenodeRestart is racy HADOOP-8591 - TestZKFailoverController tests time out
        Hide
        Eli Collins added a comment -

        I've committed this and merged to branch-2 and branch-2.0.2-alpha.

        Show
        Eli Collins added a comment - I've committed this and merged to branch-2 and branch-2.0.2-alpha.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2777 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2777/)
        HDFS-3972. Trash emptier fails in secure HA cluster. Contributed by Todd Lipcon (Revision 1390729)

        Result = SUCCESS
        eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390729
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2777 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2777/ ) HDFS-3972 . Trash emptier fails in secure HA cluster. Contributed by Todd Lipcon (Revision 1390729) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390729 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2840 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2840/)
        HDFS-3972. Trash emptier fails in secure HA cluster. Contributed by Todd Lipcon (Revision 1390729)

        Result = SUCCESS
        eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390729
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2840 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2840/ ) HDFS-3972 . Trash emptier fails in secure HA cluster. Contributed by Todd Lipcon (Revision 1390729) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390729 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2799 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2799/)
        HDFS-3972. Trash emptier fails in secure HA cluster. Contributed by Todd Lipcon (Revision 1390729)

        Result = ABORTED
        eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390729
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2799 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2799/ ) HDFS-3972 . Trash emptier fails in secure HA cluster. Contributed by Todd Lipcon (Revision 1390729) Result = ABORTED eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390729 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1178 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1178/)
        HDFS-3972. Trash emptier fails in secure HA cluster. Contributed by Todd Lipcon (Revision 1390729)

        Result = SUCCESS
        eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390729
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1178 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1178/ ) HDFS-3972 . Trash emptier fails in secure HA cluster. Contributed by Todd Lipcon (Revision 1390729) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390729 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1209 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1209/)
        HDFS-3972. Trash emptier fails in secure HA cluster. Contributed by Todd Lipcon (Revision 1390729)

        Result = SUCCESS
        eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390729
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1209 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1209/ ) HDFS-3972 . Trash emptier fails in secure HA cluster. Contributed by Todd Lipcon (Revision 1390729) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1390729 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development