Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.2.0, 3.0.0-alpha1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Target Version/s:
    • Release Note:
      Enable optional RPC-level priority to combat congestion and make request latencies more consistent.

      Description

      For an easy-to-read summary see: http://www.ebaytechblog.com/2014/08/21/quality-of-service-in-hadoop/

      Several production Hadoop cluster incidents occurred where the Namenode was overloaded and failed to respond.

      We can improve quality of service for users during namenode peak loads by replacing the FIFO call queue with a Fair Call Queue. (this plan supersedes rpc-congestion-control-draft-plan).

      Excerpted from the communication of one incident, “The map task of a user was creating huge number of small files in the user directory. Due to the heavy load on NN, the JT also was unable to communicate with NN...The cluster became responsive only once the job was killed.”

      Excerpted from the communication of another incident, “Namenode was overloaded by GetBlockLocation requests (Correction: should be getFileInfo requests. the job had a bug that called getFileInfo for a nonexistent file in an endless loop). All other requests to namenode were also affected by this and hence all jobs slowed down. Cluster almost came to a grinding halt…Eventually killed jobtracker to kill all jobs that are running.”

      Excerpted from HDFS-945, “We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.”

      1. faircallqueue.patch
        41 kB
        Chris Li
      2. faircallqueue2.patch
        73 kB
        Chris Li
      3. faircallqueue3.patch
        73 kB
        Chris Li
      4. faircallqueue4.patch
        74 kB
        Chris Li
      5. faircallqueue5.patch
        73 kB
        Chris Li
      6. faircallqueue6.patch
        74 kB
        Chris Li
      7. faircallqueue7_with_runtime_swapping.patch
        134 kB
        Chris Li
      8. FairCallQueue-PerformanceOnCluster.pdf
        694 kB
        Chris Li
      9. MinorityMajorityPerformance.pdf
        72 kB
        Chris Li
      10. NN-denial-of-service-updated-plan.pdf
        2.76 MB
        Chris Li
      11. rpc-congestion-control-draft-plan.pdf
        488 kB
        Xiaobo Peng

        Issue Links

          Activity

          Hide
          teledriver Xiaobo Peng added a comment -

          uploaded draft plan. Please comment. Thanks.

          Show
          teledriver Xiaobo Peng added a comment - uploaded draft plan. Please comment. Thanks.
          Hide
          chrilisf Chris Li added a comment -

          Effect

          In the current released version, the latency increases linearly with increased CallQueueLength from congestion, until the namenode becomes unusable. With the FairCallQueue, the latency stays constant, ensuring consistent response times under varying loads.

          Overview

          Replaces the LinkedBlockingQueue with a new FairCallQueue class that contains a configurable number of subqueues. Example:

          • Queue 0: Realtime Queue (Optionally reserved for heartbeats/block updates)
          • Queue 1: High priority
          • Queue 2: Normal priority
          • Queue 3: Low priority

          Scheduling

          RPC calls are placed into an appropriate queue via the HistoryBasedScheduler, which looks at the past 1000 requests to determine the priority of the call.

          Example:

          • A user who made 90% of the calls will be put into the Low priority queue.
          • A user who only made 1% of the calls will be put into High priority. Etc.

          Multiplexing

          RPC calls are withdrawn from an appropriate queue in a weighted round-robin fashion. Most of the time, handler threads will prefer to start reading starting from Queue 0, but occasionally will start reading from Queues 1, 2, and 3. This is to prevent starvation.

          Show
          chrilisf Chris Li added a comment - Effect In the current released version, the latency increases linearly with increased CallQueueLength from congestion, until the namenode becomes unusable. With the FairCallQueue, the latency stays constant, ensuring consistent response times under varying loads. Overview Replaces the LinkedBlockingQueue with a new FairCallQueue class that contains a configurable number of subqueues. Example: Queue 0: Realtime Queue (Optionally reserved for heartbeats/block updates) Queue 1: High priority Queue 2: Normal priority Queue 3: Low priority Scheduling RPC calls are placed into an appropriate queue via the HistoryBasedScheduler , which looks at the past 1000 requests to determine the priority of the call. Example: A user who made 90% of the calls will be put into the Low priority queue. A user who only made 1% of the calls will be put into High priority. Etc. Multiplexing RPC calls are withdrawn from an appropriate queue in a weighted round-robin fashion. Most of the time, handler threads will prefer to start reading starting from Queue 0, but occasionally will start reading from Queues 1, 2, and 3. This is to prevent starvation.
          Hide
          chrilisf Chris Li added a comment -

          Combats congestion via RPC-level priority

          Show
          chrilisf Chris Li added a comment - Combats congestion via RPC-level priority
          Hide
          chrilisf Chris Li added a comment -

          Performance on my dev machine.

          Show
          chrilisf Chris Li added a comment - Performance on my dev machine.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12614505/Screen%20Shot%202013-11-18%20at%202.54.19%20PM.png
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3293//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614505/Screen%20Shot%202013-11-18%20at%202.54.19%20PM.png against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3293//console This message is automatically generated.
          Hide
          chrilisf Chris Li added a comment -

          Updated design doc. New design could be described as Fair Share in the RPC layer.

          Show
          chrilisf Chris Li added a comment - Updated design doc. New design could be described as Fair Share in the RPC layer.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3331//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3331//console This message is automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Xiaobo and Ming and Chris, thanks for writing this up. It's very interesting stuff. I have a few comments/questions:

          • Parsing the MapReduce job name out of the DFSClient name is kind of an ugly hack. The client name also isn't that reliable since it's formed from the client's Configuration, and more generally anything in the RPC format that isn't a Kerberos token can be faked. Are these concerns in scope for your proposal?
          • Tracking by user is also not going to work so well in a HiveServer2 setup where all Hive queries are run as the hive user. This is a pretty common DB security model, since you need this for column/row-level security.
          • What's the purpose of separating read and write requests? Write requests take the write lock, and are thus more "expensive" in that sense, but your example of the listDir of a large directory is a read operation.
          • In the "Identify suspects" section, I see that you present three options here. Which one do you think is best? Seems like you're leaning toward option 3.
          • Does dropping an RPC result in exponential back-off from the client, a la TCP? Client backpressure is pretty important to reach a steady state.
          • I didn't see any mention of fair share here, are you planning to adjust suspect thresholds based on client share?
          • Any thoughts on how to automatically determine these thresholds? These seem like kind of annoying parameters to tune.
          • Maybe admin / superuser commands and service RPCs should be excluded from this feature
          • Do you have any preliminary benchmarks supporting the design? Performance is a pretty important aspect of this design.
          Show
          andrew.wang Andrew Wang added a comment - Hi Xiaobo and Ming and Chris, thanks for writing this up. It's very interesting stuff. I have a few comments/questions: Parsing the MapReduce job name out of the DFSClient name is kind of an ugly hack. The client name also isn't that reliable since it's formed from the client's Configuration , and more generally anything in the RPC format that isn't a Kerberos token can be faked. Are these concerns in scope for your proposal? Tracking by user is also not going to work so well in a HiveServer2 setup where all Hive queries are run as the hive user. This is a pretty common DB security model, since you need this for column/row-level security. What's the purpose of separating read and write requests? Write requests take the write lock, and are thus more "expensive" in that sense, but your example of the listDir of a large directory is a read operation. In the "Identify suspects" section, I see that you present three options here. Which one do you think is best? Seems like you're leaning toward option 3. Does dropping an RPC result in exponential back-off from the client, a la TCP? Client backpressure is pretty important to reach a steady state. I didn't see any mention of fair share here, are you planning to adjust suspect thresholds based on client share? Any thoughts on how to automatically determine these thresholds? These seem like kind of annoying parameters to tune. Maybe admin / superuser commands and service RPCs should be excluded from this feature Do you have any preliminary benchmarks supporting the design? Performance is a pretty important aspect of this design.
          Hide
          chrilisf Chris Li added a comment -

          Thanks for the look, Andrew.

          Parsing the MapReduce job name out of the DFSClient name is kind of an ugly hack. The client name also isn't that reliable since it's formed from the client's Configuration, and more generally anything in the RPC format that isn't a Kerberos token can be faked. Are these concerns in scope for your proposal?

          Tracking by user is also not going to work so well in a HiveServer2 setup where all Hive queries are run as the hive user. This is a pretty common DB security model, since you need this for column/row-level security.

          This is definitely up for discussion. One way would be to add a new field specifically for QoS that provides the identity (whether tied to job or user).

          I'm not too familiar with HiveServer2 and what could be done there. Maybe there's some information that's passed through about the original user?

          What's the purpose of separating read and write requests? Write requests take the write lock, and are thus more "expensive" in that sense, but your example of the listDir of a large directory is a read operation.

          In the "Identify suspects" section, I see that you present three options here. Which one do you think is best? Seems like you're leaning toward option 3.

          Does dropping an RPC result in exponential back-off from the client, a la TCP? Client backpressure is pretty important to reach a steady state.

          The NN-denial-of-service plan (using a multi-level queue) supersedes the rpc congestion control doc (identifying bad users).

          I didn't see any mention of fair share here, are you planning to adjust suspect thresholds based on client share?

          Clients over-using resources are throttled automatically by being placed into low-priority queues, bringing them back into reign. Given many users contesting over 100% of the server's resources, they will all tend to use an equal amount.

          Adjusting thresholds at runtime would be a future enhancement.

          Any thoughts on how to automatically determine these thresholds? These seem like kind of annoying parameters to tune.

          There are two thresholds to tune:
          1. the scheduler thresholds (defaults to even split e.g. with 4 queues: 25% each)
          2. the multiplexer's round-robin weights (defaults to log split e.g. 2^3 from queue 0, 2^2 from queue 1, etc)

          The defaults work pretty well for us, but different clusters will have different loads. The scheduler will provide JMX metrics to make it easier to tune.

          Maybe admin / superuser commands and service RPCs should be excluded from this feature

          Currently a config key (like ipc.8020.history-scheduler.service-users) specifies service users which are given absolute high priority, and will always be scheduled into the highest-priority queue. To completely exclude service RPC calls, one could use the service RPC server.

          Do you have any preliminary benchmarks supporting the design? Performance is a pretty important aspect of this design.

          I'll put some more numbers up shortly. Some preliminary results are on page 8 of the attachment

          I should have the code up soon as well.

          Show
          chrilisf Chris Li added a comment - Thanks for the look, Andrew. Parsing the MapReduce job name out of the DFSClient name is kind of an ugly hack. The client name also isn't that reliable since it's formed from the client's Configuration, and more generally anything in the RPC format that isn't a Kerberos token can be faked. Are these concerns in scope for your proposal? Tracking by user is also not going to work so well in a HiveServer2 setup where all Hive queries are run as the hive user. This is a pretty common DB security model, since you need this for column/row-level security. This is definitely up for discussion. One way would be to add a new field specifically for QoS that provides the identity (whether tied to job or user). I'm not too familiar with HiveServer2 and what could be done there. Maybe there's some information that's passed through about the original user? What's the purpose of separating read and write requests? Write requests take the write lock, and are thus more "expensive" in that sense, but your example of the listDir of a large directory is a read operation. In the "Identify suspects" section, I see that you present three options here. Which one do you think is best? Seems like you're leaning toward option 3. Does dropping an RPC result in exponential back-off from the client, a la TCP? Client backpressure is pretty important to reach a steady state. The NN-denial-of-service plan (using a multi-level queue) supersedes the rpc congestion control doc (identifying bad users). I didn't see any mention of fair share here, are you planning to adjust suspect thresholds based on client share? Clients over-using resources are throttled automatically by being placed into low-priority queues, bringing them back into reign. Given many users contesting over 100% of the server's resources, they will all tend to use an equal amount. Adjusting thresholds at runtime would be a future enhancement. Any thoughts on how to automatically determine these thresholds? These seem like kind of annoying parameters to tune. There are two thresholds to tune: 1. the scheduler thresholds (defaults to even split e.g. with 4 queues: 25% each) 2. the multiplexer's round-robin weights (defaults to log split e.g. 2^3 from queue 0, 2^2 from queue 1, etc) The defaults work pretty well for us, but different clusters will have different loads. The scheduler will provide JMX metrics to make it easier to tune. Maybe admin / superuser commands and service RPCs should be excluded from this feature Currently a config key (like ipc.8020.history-scheduler.service-users) specifies service users which are given absolute high priority, and will always be scheduled into the highest-priority queue. To completely exclude service RPC calls, one could use the service RPC server. Do you have any preliminary benchmarks supporting the design? Performance is a pretty important aspect of this design. I'll put some more numbers up shortly. Some preliminary results are on page 8 of the attachment I should have the code up soon as well.
          Hide
          andrew.wang Andrew Wang added a comment -

          Sorry, it looks like I read the wrong document. I'm glad you still found some of my comments useful, but I'll read the updated one too and get back to you

          Show
          andrew.wang Andrew Wang added a comment - Sorry, it looks like I read the wrong document. I'm glad you still found some of my comments useful, but I'll read the updated one too and get back to you
          Hide
          chrilisf Chris Li added a comment -

          Uploaded preview of performance with 2 users–one normal and the other abusive. Should have code up soon.

          Show
          chrilisf Chris Li added a comment - Uploaded preview of performance with 2 users–one normal and the other abusive. Should have code up soon.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12617098/MinorityMajorityPerformance.pdf
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3335//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617098/MinorityMajorityPerformance.pdf against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3335//console This message is automatically generated.
          Hide
          daryn Daryn Sharp added a comment -

          I haven't read all the docs, and I've only skimmed the patch, but the entire feature must be configurable. As in a toggle to directly use the LinkedBlockingQueue as today. An activity surge often isn't indicative of abuse, nor do I necessarily want heavy users to have priority above all others because there are multiple equal heavy users, nor do I want to debug priority inversions at this time.

          I do think the patch might have potential performance benefits, as your graph mentions, from multiple queues lowering lock contention between the 100 hungry handlers. I've been working to lower lock contention, so while in the RPC layer I considered playing with the callQ but it wasn't even a blip in the profiler.

          However, you can't extrapolate performance improvements from 2 client threads, 2 server threads, and multiple queues. I think you've effectively eliminated any lock contention and given each client their own queue. 2 threads will produce negligible contention with even 1 queue. Things don't get ugly till you have many threads contending. Measurements with at least 16-32 clients & server threads become interesting!

          Show
          daryn Daryn Sharp added a comment - I haven't read all the docs, and I've only skimmed the patch, but the entire feature must be configurable . As in a toggle to directly use the LinkedBlockingQueue as today. An activity surge often isn't indicative of abuse, nor do I necessarily want heavy users to have priority above all others because there are multiple equal heavy users, nor do I want to debug priority inversions at this time. I do think the patch might have potential performance benefits, as your graph mentions, from multiple queues lowering lock contention between the 100 hungry handlers. I've been working to lower lock contention, so while in the RPC layer I considered playing with the callQ but it wasn't even a blip in the profiler. However, you can't extrapolate performance improvements from 2 client threads, 2 server threads, and multiple queues. I think you've effectively eliminated any lock contention and given each client their own queue. 2 threads will produce negligible contention with even 1 queue. Things don't get ugly till you have many threads contending. Measurements with at least 16-32 clients & server threads become interesting!
          Hide
          chrilisf Chris Li added a comment -

          Daryn Sharp Definitely, this new patch is pluggable so that it defaults to the LinkedBlockingQueue via FIFOCallQueue. We will also be testing performance on larger clusters in January.

          Please let me know your thoughts on this new patch.

          In this new patch (faircallqueue2.patch):
          Architecture
          The FairCallQueue is responsible for its Scheduler and Mux, which in the future will be pluggable as well. It is not made pluggable now since there is only one option today.

          Changes to NameNodeRPCServer (and others) are no longer necessary.

          Scheduling Token
          Using username right now, but will switch to jobID when a good way of including it is decided upon.

          Cross-server scheduling
          Scheduling across servers (for instance, the Namenode can have 2 RPC Servers for users and service calls) will be supported in a future patch.

          Configuration
          Configuration keys are keyed by port, so for a server running on 8020:

          ipc.8020.callqueue.impl
          Defaults to FIFOCallQueue.class, which uses a LinkedBlockingQueue. To enable priority, use "org.apache.hadoop.ipc.FairCallQueue"

          ipc.8020.faircallqueue.priority-levels
          Defaults to 4, controls the number of priority levels in the faircallqueue.

          ipc.8020.history-scheduler.service-users
          A comma separated list of users that will be exempt from scheduling and given top priority. Used for giving the service users (hadoop or hdfs) absolute high priority. e.g. "hadoop,hdfs"

          ipc.8020.history-scheduler.history-length
          The number of past calls to remember. HistoryRpcScheduler will schedule requests based on this pool. Defaults to 1000.

          ipc.8020.history-scheduler.thresholds
          A comma separated list of ints that specify the thresholds for scheduling in the history scheduler. For instance with 4 queues and a history-length of 1000: "50,400,750" will schedule requests greater than 750 into queue 3, > 400 into queue 2, > 50 into queue 1, else into queue 0. Defaults to an even split (for a history-length of 200 and 4 queues it would be 50 each: "50,100,150")

          ipc.8020.wrr-multiplexer.weights
          A comma separated list of ints that specify weights for each queue. For instance with 4 queues: "10,5,5,1", which sets the handlers to draw from the queues with the following pattern:

          • Read queue0 10 times
          • Read queue1 5 times
          • Read queue2 5 times
          • Read queue3 1 time
            And then repeat. Defaults to a log2 split: For 4 queues, it would be 8,4,2,1
          Show
          chrilisf Chris Li added a comment - Daryn Sharp Definitely, this new patch is pluggable so that it defaults to the LinkedBlockingQueue via FIFOCallQueue. We will also be testing performance on larger clusters in January. Please let me know your thoughts on this new patch. In this new patch (faircallqueue2.patch): Architecture The FairCallQueue is responsible for its Scheduler and Mux, which in the future will be pluggable as well. It is not made pluggable now since there is only one option today. Changes to NameNodeRPCServer (and others) are no longer necessary. Scheduling Token Using username right now, but will switch to jobID when a good way of including it is decided upon. Cross-server scheduling Scheduling across servers (for instance, the Namenode can have 2 RPC Servers for users and service calls) will be supported in a future patch. Configuration Configuration keys are keyed by port, so for a server running on 8020: ipc.8020.callqueue.impl Defaults to FIFOCallQueue.class, which uses a LinkedBlockingQueue. To enable priority, use "org.apache.hadoop.ipc.FairCallQueue" ipc.8020.faircallqueue.priority-levels Defaults to 4, controls the number of priority levels in the faircallqueue. ipc.8020.history-scheduler.service-users A comma separated list of users that will be exempt from scheduling and given top priority. Used for giving the service users (hadoop or hdfs) absolute high priority. e.g. "hadoop,hdfs" ipc.8020.history-scheduler.history-length The number of past calls to remember. HistoryRpcScheduler will schedule requests based on this pool. Defaults to 1000. ipc.8020.history-scheduler.thresholds A comma separated list of ints that specify the thresholds for scheduling in the history scheduler. For instance with 4 queues and a history-length of 1000: "50,400,750" will schedule requests greater than 750 into queue 3, > 400 into queue 2, > 50 into queue 1, else into queue 0. Defaults to an even split (for a history-length of 200 and 4 queues it would be 50 each: "50,100,150") ipc.8020.wrr-multiplexer.weights A comma separated list of ints that specify weights for each queue. For instance with 4 queues: "10,5,5,1", which sets the handlers to draw from the queues with the following pattern: Read queue0 10 times Read queue1 5 times Read queue2 5 times Read queue3 1 time And then repeat. Defaults to a log2 split: For 4 queues, it would be 8,4,2,1
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12617457/faircallqueue2.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3343//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617457/faircallqueue2.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3343//console This message is automatically generated.
          Hide
          chrilisf Chris Li added a comment -

          Update patch to target latest trunk

          Show
          chrilisf Chris Li added a comment - Update patch to target latest trunk
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12617475/faircallqueue3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          -1 javac. The applied patch generated 1546 javac compiler warnings (more than the trunk's current 1545 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3345//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3345//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3345//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617475/faircallqueue3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. -1 javac . The applied patch generated 1546 javac compiler warnings (more than the trunk's current 1545 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3345//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3345//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3345//console This message is automatically generated.
          Hide
          chrilisf Chris Li added a comment -

          Added new version

          Show
          chrilisf Chris Li added a comment - Added new version
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12617895/faircallqueue4.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3347//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617895/faircallqueue4.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3347//console This message is automatically generated.
          Hide
          chrilisf Chris Li added a comment -

          Updated patch to target trunk

          Show
          chrilisf Chris Li added a comment - Updated patch to target trunk
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12617900/faircallqueue5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3348//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3348//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617900/faircallqueue5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3348//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3348//console This message is automatically generated.
          Hide
          sureshms Suresh Srinivas added a comment -

          I had in person meeting with Chris Li on this. This is excellent work!

          Parsing the MapReduce job name out of the DFSClient name is kind of an ugly hack. The client name also isn't that reliable since it's formed from the client's Configuration

          I had suggested this to Chris Li. I realize that the configuration passed from MapReduce is actually a task ID. So the client name based on that will not be useful, unless we parse it to get the job ID.

          I agree that this is not the way the final solution should work. I propose adding some kind of configuration that can be passed to establish context in which access to services is happening. Currently this is done by mapreduce framework. It sets the configuration "" which gets used in forming DFSClient name.

          We could do the following to satisfy the various user requirements:

          1. Add a new configuration in common called "hadoop.application.context" to HDFS. Other services that want to do the same thing can either use this same configuration and find another way to configure it. This information should be marshalled from the client to the server. The congestion control can be built based on that.
          2. Lets also make identities used for accounting configurable. They can be either based on "context", "user", "token", or "default". That way people who do not like the default configuration can make changes.
          Show
          sureshms Suresh Srinivas added a comment - I had in person meeting with Chris Li on this. This is excellent work! Parsing the MapReduce job name out of the DFSClient name is kind of an ugly hack. The client name also isn't that reliable since it's formed from the client's Configuration I had suggested this to Chris Li . I realize that the configuration passed from MapReduce is actually a task ID. So the client name based on that will not be useful, unless we parse it to get the job ID. I agree that this is not the way the final solution should work. I propose adding some kind of configuration that can be passed to establish context in which access to services is happening. Currently this is done by mapreduce framework. It sets the configuration "" which gets used in forming DFSClient name. We could do the following to satisfy the various user requirements: Add a new configuration in common called "hadoop.application.context" to HDFS. Other services that want to do the same thing can either use this same configuration and find another way to configure it. This information should be marshalled from the client to the server. The congestion control can be built based on that. Lets also make identities used for accounting configurable. They can be either based on "context", "user", "token", or "default". That way people who do not like the default configuration can make changes.
          Hide
          chrilisf Chris Li added a comment -

          Add a new configuration in common called "hadoop.application.context" to HDFS. Other services that want to do the same thing can either use this same configuration and find another way to configure it. This information should be marshalled from the client to the server. The congestion control can be built based on that.

          Just to be clear, would an example be,
          1. Cluster operator specifies ipc.8020.application.context = hadoop.yarn
          2. Namenode sees this, knows to load the class that generates job IDs from the Connection/Call?

          Or were you thinking of physically adding the id into the RPC call itself, which would make the rpc call size larger, but is a cleaner solution (albeit one that the client could spoof).

          Lets also make identities used for accounting configurable. They can be either based on "context", "user", "token", or "default". That way people who do not like the default configuration can make changes.

          Sounds like a good idea.

          Show
          chrilisf Chris Li added a comment - Add a new configuration in common called "hadoop.application.context" to HDFS. Other services that want to do the same thing can either use this same configuration and find another way to configure it. This information should be marshalled from the client to the server. The congestion control can be built based on that. Just to be clear, would an example be, 1. Cluster operator specifies ipc.8020.application.context = hadoop.yarn 2. Namenode sees this, knows to load the class that generates job IDs from the Connection/Call? Or were you thinking of physically adding the id into the RPC call itself, which would make the rpc call size larger, but is a cleaner solution (albeit one that the client could spoof). Lets also make identities used for accounting configurable. They can be either based on "context", "user", "token", or "default". That way people who do not like the default configuration can make changes. Sounds like a good idea.
          Hide
          chrilisf Chris Li added a comment -

          Uploaded new patch that adds configurable Call identity used for scheduling.

          Config:
          ipc.8020.call.identity = USER or GROUP

          In the future, this can be extended with more options

          Show
          chrilisf Chris Li added a comment - Uploaded new patch that adds configurable Call identity used for scheduling. Config: ipc.8020.call.identity = USER or GROUP In the future, this can be extended with more options
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12618954/faircallqueue6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3361//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3361//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618954/faircallqueue6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3361//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3361//console This message is automatically generated.
          Hide
          chrilisf Chris Li added a comment -

          I'd like to get people's thoughts on a new feature: Dynamic reconfiguration

          Motivation

          1. The tuning of parameters will be important for optimal performance
          2. We can recover faster from bad parameters
          3. The cost of doing a NN failover to change parameters is too high, while this would be much faster (seconds)

          User Interface

          Much like `hadoop mradmin -refreshQueueAcls` the user would run the command to reload the CallQueue based on config.

          Show
          chrilisf Chris Li added a comment - I'd like to get people's thoughts on a new feature: Dynamic reconfiguration Motivation 1. The tuning of parameters will be important for optimal performance 2. We can recover faster from bad parameters 3. The cost of doing a NN failover to change parameters is too high, while this would be much faster (seconds) User Interface Much like `hadoop mradmin -refreshQueueAcls` the user would run the command to reload the CallQueue based on config.
          Hide
          mayank_bansal Mayank Bansal added a comment -

          Hi Suresh Srinivas

          Can you please take a look at this jira?

          Thanks,
          Mayank

          Show
          mayank_bansal Mayank Bansal added a comment - Hi Suresh Srinivas Can you please take a look at this jira? Thanks, Mayank
          Hide
          chrilisf Chris Li added a comment -

          Attached preview of patch that enables swapping the namenode call queue at runtime.

          Show
          chrilisf Chris Li added a comment - Attached preview of patch that enables swapping the namenode call queue at runtime.
          Hide
          kihwal Kihwal Lee added a comment -

          Can low priority requests starve higher priority requests? If a low priority call queue is full and all reader threads are blocked on put() for adding calls belonging to that queue, newly arriving higher priority requests won't get processed even if their corresponding queue is not full. If the request rate stays greater than service rate for some time in this state, the listen queue will likely overflow and all types of requests will suffer regardless of priority.

          Show
          kihwal Kihwal Lee added a comment - Can low priority requests starve higher priority requests? If a low priority call queue is full and all reader threads are blocked on put() for adding calls belonging to that queue, newly arriving higher priority requests won't get processed even if their corresponding queue is not full. If the request rate stays greater than service rate for some time in this state, the listen queue will likely overflow and all types of requests will suffer regardless of priority.
          Hide
          chrilisf Chris Li added a comment -

          Kihwal Lee In the first 6 versions of this patch, this does indeed happen. It's partially alleviated due to the round-robin withdrawal from the queues.

          In the latest iteration of the patch (7), the reader threads would lock on the queue's putLock like they do in trunk. I think this behavior is more intuitive.

          Today I will be breaking this JIRA to make it easier to review.

          Show
          chrilisf Chris Li added a comment - Kihwal Lee In the first 6 versions of this patch, this does indeed happen. It's partially alleviated due to the round-robin withdrawal from the queues. In the latest iteration of the patch (7), the reader threads would lock on the queue's putLock like they do in trunk. I think this behavior is more intuitive. Today I will be breaking this JIRA to make it easier to review.
          Hide
          daryn Daryn Sharp added a comment -

          Agreed, this needs subtasks. General comments/requests:

          1. Please make the default callq a BlockingQueue again, and have your custom implementations conform to the interface.
          2. The default callq should remain a LinkedBlockingQueue, not a FIFOCallQueue. You're doing some pretty tricky locking and I'd rather trust the JDK.
          3. Call.getRemoteUser() would be much cleaner to get the UGI than an interface + enum to get user and group.
          4. Using the literal string "unknown!" for a user or group is not a good idea.

          The more I think about it, multiple queues will exasperate congestion problem as Kihwal points out. For that reason, I'd like to see minimal invasiveness in the Server class - I'll feel safe and you are free to experiment with alternate implementations.

          Show
          daryn Daryn Sharp added a comment - Agreed, this needs subtasks. General comments/requests: Please make the default callq a BlockingQueue again, and have your custom implementations conform to the interface. The default callq should remain a LinkedBlockingQueue , not a FIFOCallQueue . You're doing some pretty tricky locking and I'd rather trust the JDK. Call.getRemoteUser() would be much cleaner to get the UGI than an interface + enum to get user and group. Using the literal string "unknown!" for a user or group is not a good idea. The more I think about it, multiple queues will exasperate congestion problem as Kihwal points out. For that reason, I'd like to see minimal invasiveness in the Server class - I'll feel safe and you are free to experiment with alternate implementations.
          Hide
          chrilisf Chris Li added a comment -

          Daryn Sharp

          Thanks for your feedback.

          Some points of clarification:

          3. The identity is meant to be configurable, so you can schedule by user, by group, and in the future by job.
          4. Any suggestions?

          Show
          chrilisf Chris Li added a comment - Daryn Sharp Thanks for your feedback. Some points of clarification: 3. The identity is meant to be configurable, so you can schedule by user, by group, and in the future by job. 4. Any suggestions?
          Hide
          chrilisf Chris Li added a comment -

          I've uploaded the first of the patches to https://issues.apache.org/jira/browse/HADOOP-10278

          It allows the user to use a custom call queue specified via configuration, but falls back on a LinkedBlockingQueue otherwise.

          I'd like to take any further discussions about this aspect to the subtask, and get some feedback.

          Thanks

          Show
          chrilisf Chris Li added a comment - I've uploaded the first of the patches to https://issues.apache.org/jira/browse/HADOOP-10278 It allows the user to use a custom call queue specified via configuration, but falls back on a LinkedBlockingQueue otherwise. I'd like to take any further discussions about this aspect to the subtask, and get some feedback. Thanks
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12624909/faircallqueue7_with_runtime_swapping.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified test files.

          -1 javac. The applied patch generated 1547 javac compiler warnings (more than the trunk's current 1546 warnings).

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.ipc.TestQueueRuntimeReconfigure
          org.apache.hadoop.hdfs.server.namenode.TestNameNodeHttpServer

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3467//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3467//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3467//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3467//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12624909/faircallqueue7_with_runtime_swapping.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 7 new or modified test files. -1 javac . The applied patch generated 1547 javac compiler warnings (more than the trunk's current 1546 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ipc.TestQueueRuntimeReconfigure org.apache.hadoop.hdfs.server.namenode.TestNameNodeHttpServer +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3467//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3467//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3467//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3467//console This message is automatically generated.
          Hide
          chrilisf Chris Li added a comment -

          Latest version of atomicref-adapter swaps by using handlers to clear the calls, choreographed by using two refs for put and take.

          We use a software version of double clocking to ensure the queue is likely empty. This should decrease the probability of dropping calls to highly unlikely. And losing calls isn't the end of the world either, since the client handles IPC timeouts with retries.

          Tests updated too, since the queue can only be swapped when there are active readers.

          Here's what a live swap looks like:

          Show
          chrilisf Chris Li added a comment - Latest version of atomicref-adapter swaps by using handlers to clear the calls, choreographed by using two refs for put and take. We use a software version of double clocking to ensure the queue is likely empty. This should decrease the probability of dropping calls to highly unlikely. And losing calls isn't the end of the world either, since the client handles IPC timeouts with retries. Tests updated too, since the queue can only be swapped when there are active readers. Here's what a live swap looks like:
          Hide
          chrilisf Chris Li added a comment -

          Ignore above comment, it was meant for HADOOP-10278

          Show
          chrilisf Chris Li added a comment - Ignore above comment, it was meant for HADOOP-10278
          Hide
          chrilisf Chris Li added a comment -

          Hey all,

          Can anyone check out https://issues.apache.org/jira/browse/HADOOP-10280 and give feedback on the next stage?

          Show
          chrilisf Chris Li added a comment - Hey all, Can anyone check out https://issues.apache.org/jira/browse/HADOOP-10280 and give feedback on the next stage?
          Hide
          mingma Ming Ma added a comment -

          Nice work. Some high-level comments,

          1. At some point, we might need to prioritize DN RPC over client RPC so that no matter what application do to NN RPC and FSNamesystem's global lock, DN's requests will be processed timely. We can do it in two ways. a) config a global RPC server and have the pluggable CallQueue handle that. b) have one RPC server for client and one RPC server of service request, for that we will need some abstraction like https://issues.apache.org/jira/browse/HDFS-5639.

          2. CallQueue priority policy. Perhaps this could leave it to the plugin implementation. It can be somewhat soft policy like FaiCallQueue, or with some sort of allocation quota like other schedulers, .e.g., if we know the cluster has allocated 50% to some group at YARN layer, perhaps it is ok to assume that NN RPC request for that group can be around 50%.

          Show
          mingma Ming Ma added a comment - Nice work. Some high-level comments, 1. At some point, we might need to prioritize DN RPC over client RPC so that no matter what application do to NN RPC and FSNamesystem's global lock, DN's requests will be processed timely. We can do it in two ways. a) config a global RPC server and have the pluggable CallQueue handle that. b) have one RPC server for client and one RPC server of service request, for that we will need some abstraction like https://issues.apache.org/jira/browse/HDFS-5639 . 2. CallQueue priority policy. Perhaps this could leave it to the plugin implementation. It can be somewhat soft policy like FaiCallQueue, or with some sort of allocation quota like other schedulers, .e.g., if we know the cluster has allocated 50% to some group at YARN layer, perhaps it is ok to assume that NN RPC request for that group can be around 50%.
          Hide
          chrilisf Chris Li added a comment -

          Ming Ma Thanks for the feedback, these sound like good next steps for the FCQ / Scheduler

          I have uploaded results of a benchmark on a real-world (87 node) cluster. It shows QoS successfully preventing denial of service situations, but also identifies limitations of the current HistoryRpcScheduler for scaling to greater history lengths.

          I have a couple ideas floating around on how to fix this, but I will upload the current version soon to get feedback.

          Show
          chrilisf Chris Li added a comment - Ming Ma Thanks for the feedback, these sound like good next steps for the FCQ / Scheduler I have uploaded results of a benchmark on a real-world (87 node) cluster. It shows QoS successfully preventing denial of service situations, but also identifies limitations of the current HistoryRpcScheduler for scaling to greater history lengths. I have a couple ideas floating around on how to fix this, but I will upload the current version soon to get feedback.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12641612/FairCallQueue-PerformanceOnCluster.pdf
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3841//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641612/FairCallQueue-PerformanceOnCluster.pdf against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3841//console This message is automatically generated.
          Hide
          chrilisf Chris Li added a comment -

          Uploaded patches to HADOOP-10279, HADOOP-10281, and HADOOP-10282 for feedback. The new scheduler fixes the performance issues identified in the earlier PDF too.

          Show
          chrilisf Chris Li added a comment - Uploaded patches to HADOOP-10279 , HADOOP-10281 , and HADOOP-10282 for feedback. The new scheduler fixes the performance issues identified in the earlier PDF too.
          Hide
          mingma Ming Ma added a comment -

          Thanks, Chris.

          1. The current approach drops call when RPC queue is full and the client relies on RPC timeout. It will be interesting to confirm if it is useful to have RPC server throw some exception back to client and have client do exponential back off; or maybe just block the RPC reader thread instead.

          2. RPC-based approach didn't account for http request such as webHDFS. Based on some test results, it seems Jetty uses around 250 threads, small compared to the thousands of RPC handler threads. a) The bad application traffic from webHDFS still has impact on RPC latency, not as severe compared to the RPC case. b), if there are SLA jobs based on webHDFS, then the RPC throttling won't help much.

          Show
          mingma Ming Ma added a comment - Thanks, Chris. 1. The current approach drops call when RPC queue is full and the client relies on RPC timeout. It will be interesting to confirm if it is useful to have RPC server throw some exception back to client and have client do exponential back off; or maybe just block the RPC reader thread instead. 2. RPC-based approach didn't account for http request such as webHDFS. Based on some test results, it seems Jetty uses around 250 threads, small compared to the thousands of RPC handler threads. a) The bad application traffic from webHDFS still has impact on RPC latency, not as severe compared to the RPC case. b), if there are SLA jobs based on webHDFS, then the RPC throttling won't help much.
          Hide
          mingma Ming Ma added a comment -

          Sorry, there was a typo about, I meant "thousands of RPC requests" in RPC queue.

          Show
          mingma Ming Ma added a comment - Sorry, there was a typo about, I meant "thousands of RPC requests" in RPC queue.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12641612/FairCallQueue-PerformanceOnCluster.pdf
          against trunk revision e90718f.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4932//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641612/FairCallQueue-PerformanceOnCluster.pdf against trunk revision e90718f. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4932//console This message is automatically generated.
          Hide
          aw Allen Wittenauer added a comment -

          Cancelling patch, as it no longer applies.

          Show
          aw Allen Wittenauer added a comment - Cancelling patch, as it no longer applies.
          Hide
          ajithshetty Ajith S added a comment -

          Hi Chris Li

          Any progress on the issue.? If you are not looking into this, i would like to continue work

          Show
          ajithshetty Ajith S added a comment - Hi Chris Li Any progress on the issue.? If you are not looking into this, i would like to continue work
          Hide
          chrilisf Chris Li added a comment -

          Ajith S Sure, what did you have in mind?

          Show
          chrilisf Chris Li added a comment - Ajith S Sure, what did you have in mind?
          Hide
          ajithshetty Ajith S added a comment -

          I missed checking the sub-tasks. Will like to update documentation for FairCallQueue as it has lot of configuration introduced. It will be better if we can explain it briefly similar to http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. Added a sub-task for it.

          Show
          ajithshetty Ajith S added a comment - I missed checking the sub-tasks. Will like to update documentation for FairCallQueue as it has lot of configuration introduced. It will be better if we can explain it briefly similar to http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html . Added a sub-task for it.

            People

            • Assignee:
              chrilisf Chris Li
              Reporter:
              teledriver Xiaobo Peng
            • Votes:
              3 Vote for this issue
              Watchers:
              82 Start watching this issue

              Dates

              • Created:
                Updated:

                Development