Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4532

RPC call queue may fill due to current user lookup

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.0, 2.0.0-alpha, 3.0.0
    • Fix Version/s: 3.0.0, 0.23.7, 2.1.0-beta
    • Component/s: namenode
    • Labels:
      None

      Description

      Audit logging need to invoke UGI.getCurrentUser to log the user performing an operation. This is a synch'ed method, so audit logging is effectively serializing the completion of concurrent read ops. Under very heavy load, RPC throughput drops dramatically, the call queue fills with thousands of waiting operations, and clients begin timing out. Jstack shows all but one call handler blocked on UGI.getCurrentUser.

      1. HDFS-4532.branch-23.patch
        13 kB
        Daryn Sharp
      2. HDFS-4532.patch
        19 kB
        Daryn Sharp
      3. HDFS-4532.patch
        19 kB
        Daryn Sharp

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1335 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1335/)
          HDFS-4532. RPC call queue may fill due to current user lookup (daryn) (Revision 1452435)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1335 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1335/ ) HDFS-4532 . RPC call queue may fill due to current user lookup (daryn) (Revision 1452435) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #544 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/544/)
          HDFS-4532. RPC call queue may fill due to current user lookup (daryn) (Revision 1452440)

          Result = FAILURE
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452440
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #544 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/544/ ) HDFS-4532 . RPC call queue may fill due to current user lookup (daryn) (Revision 1452440) Result = FAILURE daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452440 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #146 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/146/)
          HDFS-4532. RPC call queue may fill due to current user lookup (daryn) (Revision 1452435)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #146 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/146/ ) HDFS-4532 . RPC call queue may fill due to current user lookup (daryn) (Revision 1452435) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Daryn Sharp added a comment -

          Thanks Kihwal & Suresh. I have committed to trunk/2/23.

          Show
          Daryn Sharp added a comment - Thanks Kihwal & Suresh. I have committed to trunk/2/23.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3410 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3410/)
          HDFS-4532. RPC call queue may fill due to current user lookup (daryn) (Revision 1452435)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3410 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3410/ ) HDFS-4532 . RPC call queue may fill due to current user lookup (daryn) (Revision 1452435) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Suresh Srinivas added a comment -

          +1 from me as well.

          Show
          Suresh Srinivas added a comment - +1 from me as well.
          Hide
          Daryn Sharp added a comment -

          Filed HDFS-4546. Thanks for the review.

          Show
          Daryn Sharp added a comment - Filed HDFS-4546 . Thanks for the review.
          Hide
          Kihwal Lee added a comment -

          +1 for the latest patch.

          Good questions, and a cursory scan of the code appears that hftp is not handled correctly. The servlets attempt to get the rpc proxy from the servlet context, and if it does it directly invokes methods on the server-side proxy. This patch doesn't aggravate that issue.

          Please file a jira for this.

          Show
          Kihwal Lee added a comment - +1 for the latest patch. Good questions, and a cursory scan of the code appears that hftp is not handled correctly. The servlets attempt to get the rpc proxy from the servlet context, and if it does it directly invokes methods on the server-side proxy. This patch doesn't aggravate that issue. Please file a jira for this.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571643/HDFS-4532.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4026//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4026//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571643/HDFS-4532.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4026//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4026//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Removed the tab gremlin.

          Show
          Daryn Sharp added a comment - Removed the tab gremlin.
          Hide
          Daryn Sharp added a comment -

          @Kihwal:
          Good questions, and a cursory scan of the code appears that hftp is not handled correctly. The servlets attempt to get the rpc proxy from the servlet context, and if it does it directly invokes methods on the server-side proxy. This patch doesn't aggravate that issue.

          @Suresh:
          Oops, will fix the indent. I fully agree on the standardizations, and had similar thoughts while creating the patch.

          Show
          Daryn Sharp added a comment - @Kihwal: Good questions, and a cursory scan of the code appears that hftp is not handled correctly. The servlets attempt to get the rpc proxy from the servlet context, and if it does it directly invokes methods on the server-side proxy. This patch doesn't aggravate that issue. @Suresh: Oops, will fix the indent. I fully agree on the standardizations, and had similar thoughts while creating the patch.
          Hide
          Suresh Srinivas added a comment -

          Other than the same concern as Kihwal:

          1. Minor - indentation in logAuditEvent()
          2. In a future jira, it would be good to standardize on "method" calls "methodInt". Only "methodInt" performs required locking. Permission checker is initialization, logSync, logging of edits is done in "method". I will follow up on this.
          Show
          Suresh Srinivas added a comment - Other than the same concern as Kihwal: Minor - indentation in logAuditEvent() In a future jira, it would be good to standardize on "method" calls "methodInt". Only "methodInt" performs required locking. Permission checker is initialization, logSync, logging of edits is done in "method". I will follow up on this.
          Hide
          Kihwal Lee added a comment -

          The patch itself looks good, but there is one thing I am not sure about in the existing code.
          Do isExternalInvocation() and FSNamesystem.getRemoteIp() work correctly when Hftp is used?

          Show
          Kihwal Lee added a comment - The patch itself looks good, but there is one thing I am not sure about in the existing code. Do isExternalInvocation() and FSNamesystem.getRemoteIp() work correctly when Hftp is used?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571507/HDFS-4532.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4018//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4018//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571507/HDFS-4532.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4018//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4018//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Uses the Server.getRemoteUser call in place of UGI.getCurrentUser for logging edits.

          Streamlined the audit logging so the same redundant conditions and parameters aren't checked in every method. Notably every audit logging call doesn't have to pass the ugi.

          I also found a couple places where the file stat for the audit log was being done outside the namespace lock - which can lead to race conditions where the wrong stat is logged because another operation has taken place between lock release and audit logging.

          Show
          Daryn Sharp added a comment - Uses the Server.getRemoteUser call in place of UGI.getCurrentUser for logging edits. Streamlined the audit logging so the same redundant conditions and parameters aren't checked in every method. Notably every audit logging call doesn't have to pass the ugi. I also found a couple places where the file stat for the audit log was being done outside the namespace lock - which can lead to race conditions where the wrong stat is logged because another operation has taken place between lock release and audit logging.

            People

            • Assignee:
              Daryn Sharp
              Reporter:
              Daryn Sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development