Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4532

RPC call queue may fill due to current user lookup

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.0, 2.0.0-alpha, 3.0.0
    • Fix Version/s: 0.23.7, 2.1.0-beta
    • Component/s: namenode
    • Labels:
      None

      Description

      Audit logging need to invoke UGI.getCurrentUser to log the user performing an operation. This is a synch'ed method, so audit logging is effectively serializing the completion of concurrent read ops. Under very heavy load, RPC throughput drops dramatically, the call queue fills with thousands of waiting operations, and clients begin timing out. Jstack shows all but one call handler blocked on UGI.getCurrentUser.

      1. HDFS-4532.branch-23.patch
        13 kB
        Daryn Sharp
      2. HDFS-4532.patch
        19 kB
        Daryn Sharp
      3. HDFS-4532.patch
        19 kB
        Daryn Sharp

        Issue Links

          Activity

          Daryn Sharp created issue -
          Daryn Sharp made changes -
          Field Original Value New Value
          Link This issue depends on HADOOP-9336 [ HADOOP-9336 ]
          Hide
          Daryn Sharp added a comment -

          Uses the Server.getRemoteUser call in place of UGI.getCurrentUser for logging edits.

          Streamlined the audit logging so the same redundant conditions and parameters aren't checked in every method. Notably every audit logging call doesn't have to pass the ugi.

          I also found a couple places where the file stat for the audit log was being done outside the namespace lock - which can lead to race conditions where the wrong stat is logged because another operation has taken place between lock release and audit logging.

          Show
          Daryn Sharp added a comment - Uses the Server.getRemoteUser call in place of UGI.getCurrentUser for logging edits. Streamlined the audit logging so the same redundant conditions and parameters aren't checked in every method. Notably every audit logging call doesn't have to pass the ugi. I also found a couple places where the file stat for the audit log was being done outside the namespace lock - which can lead to race conditions where the wrong stat is logged because another operation has taken place between lock release and audit logging.
          Daryn Sharp made changes -
          Attachment HDFS-4532.branch-23.patch [ 12571506 ]
          Attachment HDFS-4532.patch [ 12571507 ]
          Daryn Sharp made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571507/HDFS-4532.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4018//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4018//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571507/HDFS-4532.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4018//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4018//console This message is automatically generated.
          Hide
          Kihwal Lee added a comment -

          The patch itself looks good, but there is one thing I am not sure about in the existing code.
          Do isExternalInvocation() and FSNamesystem.getRemoteIp() work correctly when Hftp is used?

          Show
          Kihwal Lee added a comment - The patch itself looks good, but there is one thing I am not sure about in the existing code. Do isExternalInvocation() and FSNamesystem.getRemoteIp() work correctly when Hftp is used?
          Hide
          Suresh Srinivas added a comment -

          Other than the same concern as Kihwal:

          1. Minor - indentation in logAuditEvent()
          2. In a future jira, it would be good to standardize on "method" calls "methodInt". Only "methodInt" performs required locking. Permission checker is initialization, logSync, logging of edits is done in "method". I will follow up on this.
          Show
          Suresh Srinivas added a comment - Other than the same concern as Kihwal: Minor - indentation in logAuditEvent() In a future jira, it would be good to standardize on "method" calls "methodInt". Only "methodInt" performs required locking. Permission checker is initialization, logSync, logging of edits is done in "method". I will follow up on this.
          Hide
          Daryn Sharp added a comment -

          @Kihwal:
          Good questions, and a cursory scan of the code appears that hftp is not handled correctly. The servlets attempt to get the rpc proxy from the servlet context, and if it does it directly invokes methods on the server-side proxy. This patch doesn't aggravate that issue.

          @Suresh:
          Oops, will fix the indent. I fully agree on the standardizations, and had similar thoughts while creating the patch.

          Show
          Daryn Sharp added a comment - @Kihwal: Good questions, and a cursory scan of the code appears that hftp is not handled correctly. The servlets attempt to get the rpc proxy from the servlet context, and if it does it directly invokes methods on the server-side proxy. This patch doesn't aggravate that issue. @Suresh: Oops, will fix the indent. I fully agree on the standardizations, and had similar thoughts while creating the patch.
          Hide
          Daryn Sharp added a comment -

          Removed the tab gremlin.

          Show
          Daryn Sharp added a comment - Removed the tab gremlin.
          Daryn Sharp made changes -
          Attachment HDFS-4532.patch [ 12571643 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571643/HDFS-4532.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4026//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4026//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571643/HDFS-4532.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4026//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4026//console This message is automatically generated.
          Hide
          Kihwal Lee added a comment -

          +1 for the latest patch.

          Good questions, and a cursory scan of the code appears that hftp is not handled correctly. The servlets attempt to get the rpc proxy from the servlet context, and if it does it directly invokes methods on the server-side proxy. This patch doesn't aggravate that issue.

          Please file a jira for this.

          Show
          Kihwal Lee added a comment - +1 for the latest patch. Good questions, and a cursory scan of the code appears that hftp is not handled correctly. The servlets attempt to get the rpc proxy from the servlet context, and if it does it directly invokes methods on the server-side proxy. This patch doesn't aggravate that issue. Please file a jira for this.
          Hide
          Daryn Sharp added a comment -

          Filed HDFS-4546. Thanks for the review.

          Show
          Daryn Sharp added a comment - Filed HDFS-4546 . Thanks for the review.
          Hide
          Suresh Srinivas added a comment -

          +1 from me as well.

          Show
          Suresh Srinivas added a comment - +1 from me as well.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3410 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3410/)
          HDFS-4532. RPC call queue may fill due to current user lookup (daryn) (Revision 1452435)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3410 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3410/ ) HDFS-4532 . RPC call queue may fill due to current user lookup (daryn) (Revision 1452435) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Daryn Sharp added a comment -

          Thanks Kihwal & Suresh. I have committed to trunk/2/23.

          Show
          Daryn Sharp added a comment - Thanks Kihwal & Suresh. I have committed to trunk/2/23.
          Daryn Sharp made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 3.0.0 [ 12320356 ]
          Fix Version/s 0.23.7 [ 12323955 ]
          Fix Version/s 2.0.4-beta [ 12324031 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #146 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/146/)
          HDFS-4532. RPC call queue may fill due to current user lookup (daryn) (Revision 1452435)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #146 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/146/ ) HDFS-4532 . RPC call queue may fill due to current user lookup (daryn) (Revision 1452435) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #544 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/544/)
          HDFS-4532. RPC call queue may fill due to current user lookup (daryn) (Revision 1452440)

          Result = FAILURE
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452440
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #544 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/544/ ) HDFS-4532 . RPC call queue may fill due to current user lookup (daryn) (Revision 1452440) Result = FAILURE daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452440 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1335 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1335/)
          HDFS-4532. RPC call queue may fill due to current user lookup (daryn) (Revision 1452435)

          Result = SUCCESS
          daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1335 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1335/ ) HDFS-4532 . RPC call queue may fill due to current user lookup (daryn) (Revision 1452435) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1452435 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          Gavin made changes -
          Link This issue depends on HADOOP-9336 [ HADOOP-9336 ]
          Gavin made changes -
          Link This issue depends upon HADOOP-9336 [ HADOOP-9336 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Allen Wittenauer made changes -
          Fix Version/s 3.0.0 [ 12320356 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          2d 4h 6m 1 Daryn Sharp 01/Mar/13 00:25
          Patch Available Patch Available Resolved Resolved
          3d 18h 26m 1 Daryn Sharp 04/Mar/13 18:51
          Resolved Resolved Closed Closed
          176d 3h 16m 1 Arun C Murthy 27/Aug/13 23:07

            People

            • Assignee:
              Daryn Sharp
              Reporter:
              Daryn Sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development