Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      We have run into various issues in namenode and hbase w.r.t. rpc handling in multi-tenant clusters. The examples are

      https://issues.apache.org/jira/i#browse/HADOOP-9640
      https://issues.apache.org/jira/i#browse/HBASE-8836

      There are different ideas on how to prioritize rpc requests. It could be based on user id, or whether it is read request or write request, or it could use specific rule like datanode's RPC is more important than client RPC.

      We want to enable people to implement and experiiment different rpc schedulers.

      1. HDFS-5639.patch
        103 kB
        Ming Ma
      2. HDFS-5639-2.patch
        103 kB
        Ming Ma

        Activity

        Hide
        mingma Ming Ma added a comment -

        The patch borrows lots of work from https://issues.apache.org/jira/i#browse/HBASE-8884 and https://issues.apache.org/jira/i#browse/HBASE-9461. It improves couple things specific to hadoop.

        1. Scheduler could be a global object that can be shared among different rpc servers. This is useful in the NN case where there could be two RPC servers; one for client requests and one for service requests. Currently there is no way to priortize requests between client RPC and DN RPC. The patch includes both the new rpc scheduler API in hadoop-common-project and NN's usage of this API. NN's default RPC scheduler takes care of the scenario where NN uses client RPC server and service RPC server. New scheduler can be plugged in via config dfs.namenode.rpc.scheduler.factory.class.

        2. This can also be useful in the case of YARN RM where several RPC severs are used; for example it can prioritize AM RPCs over some client RPCs. The default behavior for YARN is still one scheduler for RPC server unless it changes to use a global rpc scheduler.

        3. There shouldn't be any change in terms of how RPC scheduling is done for any hadoop services.

        4. Fix the handling of queueSizePerHandler when a specific value is passed in. The fix is "maxQueueSize = handlerCount * queueSizePerHandler."

        5. Update RPCCallBenchmark to support the external rpc scheduler; include a test RpcScheduler implementation.

        6. Move CallQueueLength metric from RPCMetrics to DefaultRpcSchedulerMetrics.

        Show
        mingma Ming Ma added a comment - The patch borrows lots of work from https://issues.apache.org/jira/i#browse/HBASE-8884 and https://issues.apache.org/jira/i#browse/HBASE-9461 . It improves couple things specific to hadoop. 1. Scheduler could be a global object that can be shared among different rpc servers. This is useful in the NN case where there could be two RPC servers; one for client requests and one for service requests. Currently there is no way to priortize requests between client RPC and DN RPC. The patch includes both the new rpc scheduler API in hadoop-common-project and NN's usage of this API. NN's default RPC scheduler takes care of the scenario where NN uses client RPC server and service RPC server. New scheduler can be plugged in via config dfs.namenode.rpc.scheduler.factory.class. 2. This can also be useful in the case of YARN RM where several RPC severs are used; for example it can prioritize AM RPCs over some client RPCs. The default behavior for YARN is still one scheduler for RPC server unless it changes to use a global rpc scheduler. 3. There shouldn't be any change in terms of how RPC scheduling is done for any hadoop services. 4. Fix the handling of queueSizePerHandler when a specific value is passed in. The fix is "maxQueueSize = handlerCount * queueSizePerHandler." 5. Update RPCCallBenchmark to support the external rpc scheduler; include a test RpcScheduler implementation. 6. Move CallQueueLength metric from RPCMetrics to DefaultRpcSchedulerMetrics.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12617443/HDFS-5639.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 javadoc. The javadoc tool appears to have generated 3 warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5663//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5663//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5663//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617443/HDFS-5639.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 3 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5663//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5663//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5663//console This message is automatically generated.
        Hide
        mingma Ming Ma added a comment -

        Fix the javadoc and findbugs issues.

        Show
        mingma Ming Ma added a comment - Fix the javadoc and findbugs issues.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12617519/HDFS-5639-2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5671//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5671//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617519/HDFS-5639-2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5671//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5671//console This message is automatically generated.
        Hide
        daryn Daryn Sharp added a comment -

        This patch seems excessively large to me - compounded by unnecessary changes such as using Thread.getCurrentThread when "this" is already a thread.

        Although conceptually different to some degree, it appears to overlap with HADOOP-9640. Hiding a scheduler behind a custom BlockingQueue implementation may be a bit less intrusive. Would you please work with Chris Li to see if there's enough similarity to combine these efforts (although still via separate jiras).

        Show
        daryn Daryn Sharp added a comment - This patch seems excessively large to me - compounded by unnecessary changes such as using Thread.getCurrentThread when "this" is already a thread. Although conceptually different to some degree, it appears to overlap with HADOOP-9640 . Hiding a scheduler behind a custom BlockingQueue implementation may be a bit less intrusive. Would you please work with Chris Li to see if there's enough similarity to combine these efforts (although still via separate jiras).
        Hide
        chrilisf Chris Li added a comment -

        Something like this will be needed down the road if HADOOP-9640 is adopted; I'll open separate jiras for these enhancements when we're ready.

        Show
        chrilisf Chris Li added a comment - Something like this will be needed down the road if HADOOP-9640 is adopted; I'll open separate jiras for these enhancements when we're ready.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 patch 0m 0s The patch command could not apply the patch during dryrun.



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12617519/HDFS-5639-2.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / f1a152c
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10582/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12617519/HDFS-5639-2.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f1a152c Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10582/console This message was automatically generated.

          People

          • Assignee:
            Unassigned
            Reporter:
            mingma Ming Ma
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:

              Development