Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4.0
    • Component/s: ipc
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Tags:
      callqueue

      Description

      • Refactor CallQueue into an interface, base, and default implementation that matches today's behavior
      • Make the call queue impl configurable, keyed on port so that we minimize coupling
      1. HADOOP-10278-atomicref-rwlock.patch
        9 kB
        Chris Li
      2. HADOOP-10278-atomicref-adapter.patch
        9 kB
        Chris Li
      3. HADOOP-10278-atomicref-adapter.patch
        18 kB
        Chris Li
      4. HADOOP-10278-atomicref-adapter.patch
        16 kB
        Chris Li
      5. HADOOP-10278-atomicref-adapter.patch
        16 kB
        Chris Li
      6. HADOOP-10278-atomicref.patch
        8 kB
        Chris Li
      7. HADOOP-10278-atomicref.patch
        8 kB
        Chris Li
      8. HADOOP-10278-atomicref.patch
        8 kB
        Chris Li
      9. HADOOP-10278-atomicref.patch
        10 kB
        Chris Li
      10. HADOOP-10278.patch
        21 kB
        Chris Li
      11. HADOOP-10278.patch
        17 kB
        Chris Li

        Activity

        Hide
        chrilisf Chris Li added a comment -

        This version has the unit tests

        Show
        chrilisf Chris Li added a comment - This version has the unit tests
        Hide
        chrilisf Chris Li added a comment -

        This version has the unit tests and doesn't accidentally delete a different file.

        Show
        chrilisf Chris Li added a comment - This version has the unit tests and doesn't accidentally delete a different file.
        Hide
        chrilisf Chris Li added a comment -

        This version doesn't include an irrelevant test

        Show
        chrilisf Chris Li added a comment - This version doesn't include an irrelevant test
        Hide
        chrilisf Chris Li added a comment -

        As per Daryn Sharp's suggestions, this patch retains the original interface of the callQueue in Server.

        This patch allows the Server to use a custom implementation of BlockingQueue if the user defines ipc.8020.callqueue.impl

        It includes one such implementation, the FIFOCallQueue, which simply imitates the LinkedBlockingQueue (and uses the same 2-lock algorithm used in the JDK's implementation). Though it seems redundant, the FIFOCallQueue will have greater flexibility in that it can be swapped out at runtime (coming in a later patch).

        Show
        chrilisf Chris Li added a comment - As per Daryn Sharp 's suggestions, this patch retains the original interface of the callQueue in Server. This patch allows the Server to use a custom implementation of BlockingQueue if the user defines ipc.8020.callqueue.impl It includes one such implementation, the FIFOCallQueue, which simply imitates the LinkedBlockingQueue (and uses the same 2-lock algorithm used in the JDK's implementation). Though it seems redundant, the FIFOCallQueue will have greater flexibility in that it can be swapped out at runtime (coming in a later patch).
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12625168/subtask1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javac. The applied patch generated 1559 javac compiler warnings (more than the trunk's current 1546 warnings).

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3469//testReport/
        Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3469//artifact/trunk/patchprocess/diffJavacWarnings.txt
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3469//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625168/subtask1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The applied patch generated 1559 javac compiler warnings (more than the trunk's current 1546 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3469//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3469//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3469//console This message is automatically generated.
        Hide
        chrilisf Chris Li added a comment -

        Update tests to fix javac warnings

        Show
        chrilisf Chris Li added a comment - Update tests to fix javac warnings
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12625474/subtask1.2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javac. The applied patch generated 1555 javac compiler warnings (more than the trunk's current 1546 warnings).

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3482//testReport/
        Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3482//artifact/trunk/patchprocess/diffJavacWarnings.txt
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3482//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625474/subtask1.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The applied patch generated 1555 javac compiler warnings (more than the trunk's current 1546 warnings). +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3482//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3482//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3482//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12625491/subtask1.3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3484//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3484//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625491/subtask1.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3484//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3484//console This message is automatically generated.
        Hide
        chrilisf Chris Li added a comment -

        Hi Daryn Sharp,

        Could you take a look please?

        Thanks,

        Chris

        Show
        chrilisf Chris Li added a comment - Hi Daryn Sharp , Could you take a look please? Thanks, Chris
        Hide
        daryn Daryn Sharp added a comment -

        Upon quick glance it looks much cleaner. Questions:

        1. What advantage will the custom fifo call queue offer over a standard java queue?
        2. Instead of the new base class providing a concrete queue implementation (locking and all), is it possible for the custom queues to using a containing relationship with a standard java queue?
        Show
        daryn Daryn Sharp added a comment - Upon quick glance it looks much cleaner. Questions: What advantage will the custom fifo call queue offer over a standard java queue? Instead of the new base class providing a concrete queue implementation (locking and all), is it possible for the custom queues to using a containing relationship with a standard java queue?
        Hide
        chrilisf Chris Li added a comment -

        Thanks for checking it out.

        1. In a later patch I'll be introducing the ability to swap the call queue at runtime, which is essential for performance tuning without restarting the namenode. The FIFOCallQueue responds to methods needed to accomplish this transparently to the server.

        2. I would have preferred this myself (and earlier versions did this), but we will need control of the locks to do runtime queue swapping. In any case, the FairCallQueue (which should be coming in subtask5) will use this same locking code, so refactoring it into the base makes things cleaner.

        I will start uploading the other subtask patches

        Show
        chrilisf Chris Li added a comment - Thanks for checking it out. 1. In a later patch I'll be introducing the ability to swap the call queue at runtime, which is essential for performance tuning without restarting the namenode. The FIFOCallQueue responds to methods needed to accomplish this transparently to the server. 2. I would have preferred this myself (and earlier versions did this), but we will need control of the locks to do runtime queue swapping. In any case, the FairCallQueue (which should be coming in subtask5) will use this same locking code, so refactoring it into the base makes things cleaner. I will start uploading the other subtask patches
        Hide
        chrilisf Chris Li added a comment -

        I've uploaded a patch for https://issues.apache.org/jira/browse/HADOOP-10302 which should make some of the design decisions in this patch more clear.

        Show
        chrilisf Chris Li added a comment - I've uploaded a patch for https://issues.apache.org/jira/browse/HADOOP-10302 which should make some of the design decisions in this patch more clear.
        Hide
        benoyantony Benoy Antony added a comment -

        Chris Li All the methods in CallQueue are already in BlockingQueue. Do we really need to define a new interface -CallQueue ?

        Show
        benoyantony Benoy Antony added a comment - Chris Li All the methods in CallQueue are already in BlockingQueue . Do we really need to define a new interface - CallQueue ?
        Hide
        chrilisf Chris Li added a comment -

        I've done so for flexibility–the later patch https://issues.apache.org/jira/browse/HADOOP-10302 adds methods to CallQueue necessary for swapping a queue at runtime.

        It's also convenient to be able to check if a call queue is a custom call queue vs a LinkedBlockingQueue, without requiring the callqueue to extend BaseCallQueue

        Show
        chrilisf Chris Li added a comment - I've done so for flexibility–the later patch https://issues.apache.org/jira/browse/HADOOP-10302 adds methods to CallQueue necessary for swapping a queue at runtime. It's also convenient to be able to check if a call queue is a custom call queue vs a LinkedBlockingQueue, without requiring the callqueue to extend BaseCallQueue
        Hide
        benoyantony Benoy Antony added a comment -

        If it is a marker interface for now, then can we remove the redudant methods from CallQueue ?

        Show
        benoyantony Benoy Antony added a comment - If it is a marker interface for now, then can we remove the redudant methods from CallQueue ?
        Hide
        chrilisf Chris Li added a comment -

        Sounds good, I'll update the patch

        Show
        chrilisf Chris Li added a comment - Sounds good, I'll update the patch
        Hide
        chrilisf Chris Li added a comment -

        Removed redundant methods on CallQueue

        Show
        chrilisf Chris Li added a comment - Removed redundant methods on CallQueue
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12626436/subtask1.4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3516//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3516//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626436/subtask1.4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3516//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3516//console This message is automatically generated.
        Hide
        benoyantony Benoy Antony added a comment -

        Could you please generate the patch with --no-prefix ?
        There are a few additional white spaces in the patch.

        Show
        benoyantony Benoy Antony added a comment - Could you please generate the patch with --no-prefix ? There are a few additional white spaces in the patch.
        Hide
        chrilisf Chris Li added a comment -

        Patch generated with

        git format-patch trunk --stdout --no-prefix --ignore-space-change > subtask1.5.patch

        Show
        chrilisf Chris Li added a comment - Patch generated with git format-patch trunk --stdout --no-prefix --ignore-space-change > subtask1.5.patch
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12626501/subtask1.5.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3522//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626501/subtask1.5.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3522//console This message is automatically generated.
        Hide
        chrilisf Chris Li added a comment -

        Uploaded HADOOP-10278.patch

        Removes whitespace at end of lines

        Show
        chrilisf Chris Li added a comment - Uploaded HADOOP-10278 .patch Removes whitespace at end of lines
        Hide
        daryn Daryn Sharp added a comment -

        (Just noticed I didn't resubmit this comment after jira was having issues last week)

        I see. I'm a bit uneasy about the maintenance and reduced flexibility from a custom concrete queue impl. I haven't thought this through, but would it be feasible to do a swapout by using an atomic ref for the callq and using offer and poll with timeouts? Ex. a handler would replace call = callQueue.take() with:

        Call call;
        do {
          call = callQueue.get().poll(100L, TimeUnit.MILLISECONDS));
        } while (call == null);
        

        Refreshing the queue would create a new queue, swap it into the atomic ref, then drain calls in the prior queue and add to the new queue. Handlers would consume at most 1 call during the swap, and if they block on an empty queue the above code will cause them to switch over. Just a thought.

        Show
        daryn Daryn Sharp added a comment - (Just noticed I didn't resubmit this comment after jira was having issues last week) I see. I'm a bit uneasy about the maintenance and reduced flexibility from a custom concrete queue impl. I haven't thought this through, but would it be feasible to do a swapout by using an atomic ref for the callq and using offer and poll with timeouts? Ex. a handler would replace call = callQueue.take() with: Call call; do { call = callQueue.get().poll(100L, TimeUnit.MILLISECONDS)); } while (call == null ); Refreshing the queue would create a new queue, swap it into the atomic ref, then drain calls in the prior queue and add to the new queue. Handlers would consume at most 1 call during the swap, and if they block on an empty queue the above code will cause them to switch over. Just a thought.
        Hide
        chrilisf Chris Li added a comment -

        That's an interesting thought, I do like that it would work with any BlockingQueue.

        I'll give it a try see how it benchmarks too.

        Show
        chrilisf Chris Li added a comment - That's an interesting thought, I do like that it would work with any BlockingQueue. I'll give it a try see how it benchmarks too.
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12626710/HADOOP-10278.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3523//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3523//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626710/HADOOP-10278.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3523//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3523//console This message is automatically generated.
        Hide
        chrilisf Chris Li added a comment -

        Daryn Sharp

        Implemented and benchmarked it. All around less clutter and custom locking required. I do like this solution.

        Benchmarked performance:
        using RPCCallBenchmark -s 4 -c 10
        Average server time increased by 1.5% (0.05 p value, 1-tailed/2-sample-equal-variance t test)
        Other results were insignificant

        If the performance hit is acceptable, I think this solution is easier, and I can re-submit this patch.


        Raw results
        Group calls/s Server CPU Client CPU
        Poll 32356.0 48018 47837
        Poll 31497.0 46129 44052
        Poll 32456.0 46400 43643
        Poll 31253.0 46343 44152
        Poll 31572.0 46241 44123
        Take 30831.0 46521 45315
        Take 31764.0 45549 43827
        Take 32534.0 45949 44028
        Take 32153.0 45699 44215
        Take 31455.0 45823 43926

        T Tests:
        Throughput 0.419851632864817
        Server CPU 0.0509472764018302
        Client CPU 0.279864716054668

        Show
        chrilisf Chris Li added a comment - Daryn Sharp Implemented and benchmarked it. All around less clutter and custom locking required. I do like this solution. Benchmarked performance: using RPCCallBenchmark -s 4 -c 10 Average server time increased by 1.5% (0.05 p value, 1-tailed/2-sample-equal-variance t test) Other results were insignificant If the performance hit is acceptable, I think this solution is easier, and I can re-submit this patch. Raw results Group calls/s Server CPU Client CPU Poll 32356.0 48018 47837 Poll 31497.0 46129 44052 Poll 32456.0 46400 43643 Poll 31253.0 46343 44152 Poll 31572.0 46241 44123 Take 30831.0 46521 45315 Take 31764.0 45549 43827 Take 32534.0 45949 44028 Take 32153.0 45699 44215 Take 31455.0 45823 43926 T Tests: Throughput 0.419851632864817 Server CPU 0.0509472764018302 Client CPU 0.279864716054668
        Hide
        daryn Daryn Sharp added a comment -

        Given my current focus on performance, it saddens me that the get+poll would be noticeable. Did you try tinkering with the poll time? Perhaps some number of seconds is a reasonable time for handlers to switch over to a new queue since I would expect that queue swapping would be an infrequent event.

        Please go ahead and post the patch. It's really the combination of performance and simplicity/maintainability that should be the determining factors.

        Show
        daryn Daryn Sharp added a comment - Given my current focus on performance, it saddens me that the get+poll would be noticeable. Did you try tinkering with the poll time? Perhaps some number of seconds is a reasonable time for handlers to switch over to a new queue since I would expect that queue swapping would be an infrequent event. Please go ahead and post the patch. It's really the combination of performance and simplicity/maintainability that should be the determining factors.
        Hide
        chrilisf Chris Li added a comment -

        Sure thing, patch updated with atomicref version

        Potential issue?
        From the BlockingQueue documentation: "drainTo: ...the behavior of this operation is undefined if the specified collection is modified while the operation is in progress."

        On Performance:

        • The time can be increased since queue swaps should be rare, maybe even pegged to ipc.client.connect.timeout or made configurable
        • The performance hit will probably increase with more handler threads too
        • Even though the server CPU time increased, the throughput wasn't really affected, so it's definitely not the bottleneck (at least not on my machine)
        Show
        chrilisf Chris Li added a comment - Sure thing, patch updated with atomicref version Potential issue? From the BlockingQueue documentation : "drainTo: ...the behavior of this operation is undefined if the specified collection is modified while the operation is in progress." On Performance: The time can be increased since queue swaps should be rare, maybe even pegged to ipc.client.connect.timeout or made configurable The performance hit will probably increase with more handler threads too Even though the server CPU time increased, the throughput wasn't really affected, so it's definitely not the bottleneck (at least not on my machine)
        Hide
        benoyantony Benoy Antony added a comment -

        You can get a small performance gain using volatile instead of AtomicRef since this case does not require the guarantee of AtomicRef's compareAndSwap.

        Show
        benoyantony Benoy Antony added a comment - You can get a small performance gain using volatile instead of AtomicRef since this case does not require the guarantee of AtomicRef's compareAndSwap.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12626790/HADOOP-10278-atomicref.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3525//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3525//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626790/HADOOP-10278-atomicref.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3525//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3525//console This message is automatically generated.
        Hide
        jingzhao Jing Zhao added a comment -

        One nit: in the latest patch, do we still need the String and configuration parameter?

        +  private BlockingQueue<Call> createCallQueueInstance(Class theClass, int maxLen,
        +    String ns, Configuration conf) {
        
        Show
        jingzhao Jing Zhao added a comment - One nit: in the latest patch, do we still need the String and configuration parameter? + private BlockingQueue<Call> createCallQueueInstance( Class theClass, int maxLen, + String ns, Configuration conf) {
        Hide
        daryn Daryn Sharp added a comment -

        I need to look at the latest patch, but comments on the comments:

        The performance hit will probably increase with more handler threads too

        Aside: With my background cycles (which is about nil), HADOOP-10300 should hopefully be able to reduce the number of handlers yet improve performance. Fewer threads also mean less contention on the callq and hopefully even better performance. I've got a POC but haven't stressed or benched it yet.

        Even though the server CPU time increased, the throughput wasn't really affected

        An impact to throughput would be a concern, but cpu utilization isn't (yet) much of a concern. I'm trying to get the NN to actually use more than a few cores under heavy load.

        You can get a small performance gain using volatile instead of AtomicRef

        I'm curious if that's true. I'd expect a warmed up JVM to have effectively inlined the call. Personally I prefer AtomicReference to volatile if for no other reason than it's explicit to a dev that something about the ref is "magical", but if the impact is measurable I would be swayed.

        Show
        daryn Daryn Sharp added a comment - I need to look at the latest patch, but comments on the comments: The performance hit will probably increase with more handler threads too Aside: With my background cycles (which is about nil), HADOOP-10300 should hopefully be able to reduce the number of handlers yet improve performance. Fewer threads also mean less contention on the callq and hopefully even better performance. I've got a POC but haven't stressed or benched it yet. Even though the server CPU time increased, the throughput wasn't really affected An impact to throughput would be a concern, but cpu utilization isn't (yet) much of a concern. I'm trying to get the NN to actually use more than a few cores under heavy load. You can get a small performance gain using volatile instead of AtomicRef I'm curious if that's true. I'd expect a warmed up JVM to have effectively inlined the call. Personally I prefer AtomicReference to volatile if for no other reason than it's explicit to a dev that something about the ref is "magical", but if the impact is measurable I would be swayed.
        Hide
        chrilisf Chris Li added a comment -

        Good catch, here's a version without those extra parameters.

        I'll add them back in a later patch when I use them.

        Show
        chrilisf Chris Li added a comment - Good catch, here's a version without those extra parameters. I'll add them back in a later patch when I use them.
        Hide
        chrilisf Chris Li added a comment -

        Daryn Sharp

        I'm working on a new set of benchmarks to test atomicref vs volatile, should have results later today

        Show
        chrilisf Chris Li added a comment - Daryn Sharp I'm working on a new set of benchmarks to test atomicref vs volatile, should have results later today
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12626977/HADOOP-10278-atomicref.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javac. The patch appears to cause the build to fail.

        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3532//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626977/HADOOP-10278-atomicref.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3532//console This message is automatically generated.
        Hide
        chrilisf Chris Li added a comment -

        Jing Zhao This patch should fix that issue

        Daryn Sharp Benoy Antony

        I completed tests using more powerful servers, doing RPCCallBenchmark -c 500 -s 300 -t 60

        First test between trunk, HADOOP-10278-atomic.patch, and a variation using volatile instead of atomic ref:

        • Both experimental groups showed a statistically significant decrease in throughput (-1.1%, p=.0007) and increase in server CPU time (1.5%, p=.0001) compared to trunk.
        • No significant difference between volatile and atomicref (or difference is too small to be noticed among other noise. p=.4)

        Second test between trunk, HADOOP-10278-atomic.patch, and variation using 1000ms wait instead of 100ms (on a different machine):

        • Significant throughput decrease between 100ms poll and trunk, as before (-0.6%, p=0.03)
        • No significant throughput decrease between 1000ms poll and trunk (-0.2%, p=.26)
        • Significant increase in server CPU compared to trunk for both 100ms and 1000ms poll (about 2.2%, p=0.0007)
        • No significant difference in server CPU between 100ms and 1000ms poll (p=.34)
        • Significant increase in client CPU for 1000ms poll vs trunk (1.2%, p=0.03)

        -----------

        Benchmarks show that a longer poll timeout is better for throughput performance, but will still raise the server CPU about the same.

        Thoughts?

        Show
        chrilisf Chris Li added a comment - Jing Zhao This patch should fix that issue Daryn Sharp Benoy Antony I completed tests using more powerful servers, doing RPCCallBenchmark -c 500 -s 300 -t 60 First test between trunk, HADOOP-10278 -atomic.patch, and a variation using volatile instead of atomic ref: Both experimental groups showed a statistically significant decrease in throughput (-1.1%, p=.0007) and increase in server CPU time (1.5%, p=.0001) compared to trunk. No significant difference between volatile and atomicref (or difference is too small to be noticed among other noise. p=.4) Second test between trunk, HADOOP-10278 -atomic.patch, and variation using 1000ms wait instead of 100ms (on a different machine): Significant throughput decrease between 100ms poll and trunk, as before (-0.6%, p=0.03) No significant throughput decrease between 1000ms poll and trunk (-0.2%, p=.26) Significant increase in server CPU compared to trunk for both 100ms and 1000ms poll (about 2.2%, p=0.0007) No significant difference in server CPU between 100ms and 1000ms poll (p=.34) Significant increase in client CPU for 1000ms poll vs trunk (1.2%, p=0.03) ----------- Benchmarks show that a longer poll timeout is better for throughput performance, but will still raise the server CPU about the same. Thoughts?
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12627050/HADOOP-10278-atomicref.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3537//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3537//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627050/HADOOP-10278-atomicref.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3537//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3537//console This message is automatically generated.
        Hide
        chrilisf Chris Li added a comment -

        Hi Daryn Sharp

        What are your thoughts on the performance to code complexity tradeoff?

        Thanks,

        Chris

        Show
        chrilisf Chris Li added a comment - Hi Daryn Sharp What are your thoughts on the performance to code complexity tradeoff? Thanks, Chris
        Hide
        ikeda Hiroshi Ikeda added a comment -
        @@ -1873,7 +1913,7 @@ private void processRpcRequest(RpcRequestHeaderProto header,
               Call call = new Call(header.getCallId(), header.getRetryCount(),
                   rpcRequest, this, ProtoUtil.convert(header.getRpcKind()), header
                       .getClientId().toByteArray());
        -      callQueue.put(call);              // queue the call; maybe blocked here
        +      callQueue.get().put(call);              // queue the call; maybe blocked here
               incRpcCount();  // Increment the rpc count
             }
        

        It may be rare but it is logically possible that the queue gotten from callQueue may be already replaced and drained when invoking put().

        Show
        ikeda Hiroshi Ikeda added a comment - @@ -1873,7 +1913,7 @@ private void processRpcRequest(RpcRequestHeaderProto header, Call call = new Call(header.getCallId(), header.getRetryCount(), rpcRequest, this , ProtoUtil.convert(header.getRpcKind()), header .getClientId().toByteArray()); - callQueue.put(call); // queue the call; maybe blocked here + callQueue.get().put(call); // queue the call; maybe blocked here incRpcCount(); // Increment the rpc count } It may be rare but it is logically possible that the queue gotten from callQueue may be already replaced and drained when invoking put().
        Hide
        daryn Daryn Sharp added a comment -

        Significant increase in client CPU for 1000ms poll vs trunk (1.2%, p=0.03)

        Hmm, why does client cpu increase? The client is just blocked reading its socket waiting for a call response. I could maybe understand a slight client call latency increase from the -0.2% throughput, but cpu?

        Hiroshi points out a valid race, much like handlers possibly consuming one more call, readers can insert one more. Might need to do drainTo until isEmpty, need to think about it.

        Show
        daryn Daryn Sharp added a comment - Significant increase in client CPU for 1000ms poll vs trunk (1.2%, p=0.03) Hmm, why does client cpu increase? The client is just blocked reading its socket waiting for a call response. I could maybe understand a slight client call latency increase from the -0.2% throughput, but cpu? Hiroshi points out a valid race, much like handlers possibly consuming one more call, readers can insert one more. Might need to do drainTo until isEmpty, need to think about it.
        Hide
        chrilisf Chris Li added a comment -

        Not sure why that is, I can re-run that test with different timeouts.

        Right now I'm exploring using a ReadWriteLock to protect against that race.

        As an aside, I went back to get benchmarking results for the FIFOCallQueue and found that it has no significant difference against trunk for throughput and client, but a 2% reduction in server cpu (p=.0055)

        Show
        chrilisf Chris Li added a comment - Not sure why that is, I can re-run that test with different timeouts. Right now I'm exploring using a ReadWriteLock to protect against that race. As an aside, I went back to get benchmarking results for the FIFOCallQueue and found that it has no significant difference against trunk for throughput and client, but a 2% reduction in server cpu (p=.0055)
        Hide
        chrilisf Chris Li added a comment -

        Interesting results from testing with different timeouts. Confirms previous results, showing increase in client time alongside server time. Since I didn't change any clientside code, it would seem that the client cpu time is dependent on the server at least a bit.

        One surprising result is that the 2500ms poll time fared a little bit worse than 1000ms in throughput.

        Test groups:
        polltimeout_evenlonger with 39 samples - 2500ms poll time
        polltimeout_longer with 38 samples - 1000ms poll time
        trunk with 39 samples

        =================
        Test: client
        =================
        	 polltimeout_evenlonger vs polltimeout_longer
        		Changed by 0.23%, p=0.68354
        	*** polltimeout_evenlonger vs trunk
        		Changed by -1.05%, p=0.01338
        	*** polltimeout_longer vs trunk
        		Changed by -1.28%, p=0.02356
        
        =================
        Test: throughput
        =================
        	*** polltimeout_evenlonger vs polltimeout_longer
        		Changed by -0.61%, p=0.02441
        	 polltimeout_evenlonger vs trunk
        		Changed by -0.53%, p=0.09965
        	 polltimeout_longer vs trunk
        		Changed by 0.07%, p=0.83207
        
        =================
        Test: server
        =================
        	 polltimeout_evenlonger vs polltimeout_longer
        		Changed by 0.30%, p=0.57614
        	*** polltimeout_evenlonger vs trunk
        		Changed by -1.99%, p=0.00016
        	*** polltimeout_longer vs trunk
        		Changed by -2.28%, p=0.00006
        
        
        
        ttest assumes 2 sample same variance, 2 tailed.
        *** marks where p value reaches significance (<0.05)
        
        Show
        chrilisf Chris Li added a comment - Interesting results from testing with different timeouts. Confirms previous results, showing increase in client time alongside server time. Since I didn't change any clientside code, it would seem that the client cpu time is dependent on the server at least a bit. One surprising result is that the 2500ms poll time fared a little bit worse than 1000ms in throughput. Test groups: polltimeout_evenlonger with 39 samples - 2500ms poll time polltimeout_longer with 38 samples - 1000ms poll time trunk with 39 samples ================= Test: client ================= polltimeout_evenlonger vs polltimeout_longer Changed by 0.23%, p=0.68354 *** polltimeout_evenlonger vs trunk Changed by -1.05%, p=0.01338 *** polltimeout_longer vs trunk Changed by -1.28%, p=0.02356 ================= Test: throughput ================= *** polltimeout_evenlonger vs polltimeout_longer Changed by -0.61%, p=0.02441 polltimeout_evenlonger vs trunk Changed by -0.53%, p=0.09965 polltimeout_longer vs trunk Changed by 0.07%, p=0.83207 ================= Test: server ================= polltimeout_evenlonger vs polltimeout_longer Changed by 0.30%, p=0.57614 *** polltimeout_evenlonger vs trunk Changed by -1.99%, p=0.00016 *** polltimeout_longer vs trunk Changed by -2.28%, p=0.00006 ttest assumes 2 sample same variance, 2 tailed. *** marks where p value reaches significance (<0.05)
        Hide
        chrilisf Chris Li added a comment -

        Updated tests for using a rwlock:

        Test groups:
        polltimeout_rwlock with 363 samples - uses a ReentrantReadWriteLock to protect against queue swaps (as a necessity, it also replaces queue.put() with queue.offer()
        polltimeout_longer with 362 samples - same as above
        trunk with 362 samples

        Interestingly, the rwlock version uses less of the server's cpu than with just a poll. However it decreases throughput by half a percent.

        =================
        Test: client
        =================
        	*** polltimeout_rwlock vs polltimeout_longer
        		Changed by 0.43%, p=0.00540
        	 polltimeout_rwlock vs trunk
        		Changed by -0.13%, p=0.44964
        	*** polltimeout_longer vs trunk
        		Changed by -0.56%, p=0.00084
        
        =================
        Test: throughput
        =================
        	*** polltimeout_rwlock vs polltimeout_longer
        		Changed by 0.46%, p=0.00006
        	*** polltimeout_rwlock vs trunk
        		Changed by 0.60%, p=0.00000
        	 polltimeout_longer vs trunk
        		Changed by 0.13%, p=0.23419
        
        =================
        Test: server
        =================
        	*** polltimeout_rwlock vs polltimeout_longer
        		Changed by 0.37%, p=0.03796
        	*** polltimeout_rwlock vs trunk
        		Changed by -1.72%, p=0.00000
        	*** polltimeout_longer vs trunk
        		Changed by -2.09%, p=0.00000
        
        
        
        ttest assumes 2 sample same variance, 2 tailed.
        *** marks where p value reaches significance (<0.05)
        
        Show
        chrilisf Chris Li added a comment - Updated tests for using a rwlock: Test groups: polltimeout_rwlock with 363 samples - uses a ReentrantReadWriteLock to protect against queue swaps (as a necessity, it also replaces queue.put() with queue.offer() polltimeout_longer with 362 samples - same as above trunk with 362 samples Interestingly, the rwlock version uses less of the server's cpu than with just a poll. However it decreases throughput by half a percent. ================= Test: client ================= *** polltimeout_rwlock vs polltimeout_longer Changed by 0.43%, p=0.00540 polltimeout_rwlock vs trunk Changed by -0.13%, p=0.44964 *** polltimeout_longer vs trunk Changed by -0.56%, p=0.00084 ================= Test: throughput ================= *** polltimeout_rwlock vs polltimeout_longer Changed by 0.46%, p=0.00006 *** polltimeout_rwlock vs trunk Changed by 0.60%, p=0.00000 polltimeout_longer vs trunk Changed by 0.13%, p=0.23419 ================= Test: server ================= *** polltimeout_rwlock vs polltimeout_longer Changed by 0.37%, p=0.03796 *** polltimeout_rwlock vs trunk Changed by -1.72%, p=0.00000 *** polltimeout_longer vs trunk Changed by -2.09%, p=0.00000 ttest assumes 2 sample same variance, 2 tailed. *** marks where p value reaches significance (<0.05)
        Hide
        chrilisf Chris Li added a comment -

        For reference, here is the patch using a ReadWriteLock

        I'm not a big fan of this because it makes the swap take a long time (and causes a big queue backup in the process). In addition, it adds locking complexity to the Server class that makes it more complex, which was my motivation for refactoring this type of logic into CallQueueBase in the first place.

        Show
        chrilisf Chris Li added a comment - For reference, here is the patch using a ReadWriteLock I'm not a big fan of this because it makes the swap take a long time (and causes a big queue backup in the process). In addition, it adds locking complexity to the Server class that makes it more complex, which was my motivation for refactoring this type of logic into CallQueueBase in the first place.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12628559/HADOOP-10278-atomicref-rwlock.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

        org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3563//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3563//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628559/HADOOP-10278-atomicref-rwlock.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3563//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3563//console This message is automatically generated.
        Hide
        ikeda Hiroshi Ikeda added a comment -

        It may just make unnecessary addition to the patch for reference, but read lock is required to invoke callQueue.get().size(), for the same reason I commented before.

        Show
        ikeda Hiroshi Ikeda added a comment - It may just make unnecessary addition to the patch for reference, but read lock is required to invoke callQueue.get().size(), for the same reason I commented before.
        Hide
        chrilisf Chris Li added a comment -

        Thanks for the catch Hiroshi Ikeda

        I'm curious if you see any issues in the locking of https://issues.apache.org/jira/secure/attachment/12626710/HADOOP-10278.patch

        Show
        chrilisf Chris Li added a comment - Thanks for the catch Hiroshi Ikeda I'm curious if you see any issues in the locking of https://issues.apache.org/jira/secure/attachment/12626710/HADOOP-10278.patch
        Hide
        chrilisf Chris Li added a comment -

        Hiroshi Ikeda uploaded a cleaner version of the original patch (HADOOP-10278.patch)

        Show
        chrilisf Chris Li added a comment - Hiroshi Ikeda uploaded a cleaner version of the original patch ( HADOOP-10278 .patch)
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12628656/HADOOP-10278.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3565//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3565//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3565//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628656/HADOOP-10278.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3565//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3565//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3565//console This message is automatically generated.
        Hide
        ikeda Hiroshi Ikeda added a comment -

        Chris Li CallQueue and its subclass break the contract of BlockingQueue, and actually CallQueue is not needed to implement BlockingQueue. CallQueue is used for ipc.Server and it requires only 3 methods put(), take(), and size(), so it is natural idea to extract CallQueue as interface and/or create an adapter class from BlockingQueue or something to CallQueue. I think synchronization logic (including dynamically changing substantial queues if needed) should be encapsulated inside the adapter implementation.

        Show
        ikeda Hiroshi Ikeda added a comment - Chris Li CallQueue and its subclass break the contract of BlockingQueue, and actually CallQueue is not needed to implement BlockingQueue. CallQueue is used for ipc.Server and it requires only 3 methods put(), take(), and size(), so it is natural idea to extract CallQueue as interface and/or create an adapter class from BlockingQueue or something to CallQueue. I think synchronization logic (including dynamically changing substantial queues if needed) should be encapsulated inside the adapter implementation.
        Hide
        daryn Daryn Sharp added a comment -

        I think some of the anomalies you are seeing may be due to other entropy. I'd be concerned if it was say a 5% difference, but a fraction of a percent isn't worth extra complexity and locking. I checked the source for take vs poll, and they are virtually identical sans one calls condition.await() and the other condition.await(nanos). Thus I don't see how poll time impacts normal operation. During a queue swap, a longer poll just means idle handlers waiting on an empty queue will take a little longer to switch over to another queue - no big deal. I also can't see how poll time and client cpu can possibly be correlated.

        Anyway, another RW lock or other synchronization to protect against the exceedingly rare, if ever, use case of a queue swap is overkill. Although the impact may appear minimal at the scale you tested, I have grave concerns with 5-10 readers and 100-200 handlers. Highly contended locks exhibit very poor performance characteristics.

        I glanced at the former pre-lock/custom-queue patch and I think it's pretty good. Ie. bounded take via poll and atomic ref swap. It's minimally invasive and shouldn't pose any real-world (non micro-benchmark) performance impact. Drainto may need to be replaced to avoid undefined behavior.

        I generally agree with Hiroshi on abstraction but creating a wrapper class to passthrough all the methods just to convert take to poll may be overkill.

        Show
        daryn Daryn Sharp added a comment - I think some of the anomalies you are seeing may be due to other entropy. I'd be concerned if it was say a 5% difference, but a fraction of a percent isn't worth extra complexity and locking. I checked the source for take vs poll, and they are virtually identical sans one calls condition.await() and the other condition.await(nanos). Thus I don't see how poll time impacts normal operation. During a queue swap, a longer poll just means idle handlers waiting on an empty queue will take a little longer to switch over to another queue - no big deal. I also can't see how poll time and client cpu can possibly be correlated. Anyway, another RW lock or other synchronization to protect against the exceedingly rare, if ever, use case of a queue swap is overkill. Although the impact may appear minimal at the scale you tested, I have grave concerns with 5-10 readers and 100-200 handlers. Highly contended locks exhibit very poor performance characteristics. I glanced at the former pre-lock/custom-queue patch and I think it's pretty good. Ie. bounded take via poll and atomic ref swap. It's minimally invasive and shouldn't pose any real-world (non micro-benchmark) performance impact. Drainto may need to be replaced to avoid undefined behavior. I generally agree with Hiroshi on abstraction but creating a wrapper class to passthrough all the methods just to convert take to poll may be overkill.
        Hide
        chrilisf Chris Li added a comment -

        Here's atomicref without the RWLock, using a safer drain

        The swap itself will take ~1 second to complete due to the use of poll and offer's timeouts in the swap

        It's starting to feel heavy in the Server class, so like Hiroshi Ikeda's idea on encapsulating this logic into a CallQueue adapter class, since it could take care of all the polling, swapping, safeDrainTo, custom string formating for queue logging, etc

        I'll see what that might look like

        Show
        chrilisf Chris Li added a comment - Here's atomicref without the RWLock, using a safer drain The swap itself will take ~1 second to complete due to the use of poll and offer's timeouts in the swap It's starting to feel heavy in the Server class, so like Hiroshi Ikeda 's idea on encapsulating this logic into a CallQueue adapter class, since it could take care of all the polling, swapping, safeDrainTo, custom string formating for queue logging, etc I'll see what that might look like
        Hide
        chrilisf Chris Li added a comment -

        Daryn Sharp Hiroshi Ikeda What do you guys think of HADOOP-10278-atomicref-adapter?

        Minimal changes to the Server class, and groups the related queue swapping logic.

        It also looks easier to write tests for.

        Show
        chrilisf Chris Li added a comment - Daryn Sharp Hiroshi Ikeda What do you guys think of HADOOP-10278 -atomicref-adapter? Minimal changes to the Server class, and groups the related queue swapping logic. It also looks easier to write tests for.
        Hide
        ikeda Hiroshi Ikeda added a comment -

        Added here my image before I forget:

        class CallQueue {
        	final AtomicInteger sizeRef = new ...
        	final Semaphore putSemaphore;
        	final Semaphore takeSemaphore = new Semaphore(0);
        	final ReadWriteLock internalQueueLock = new ...
        	/** The reference is guarded by internalQueueLock. */
        	InternalQueue internalQueue;
        
        	CallQueue(int maxQueueSize, IntenralQueue initInternalQueue) {
        		putSemaphore = new Semaphore(maxQueueSize);
        		intenralQueue = initInteranlQueue;
        	}
        
        	/** Implementation must be thread safe. */
        	interface InternalQueue {
        		/** Returns null if no element. */
        		Call poll();
        		void offer(Call);
        	}
        
        	void replaceInternalQueue(InternalQueue internalQueue) {
        		internalQueueLock.writeLock().lock();
        		try {
        			Call call;
        			while((call = this.internalQueue.poll()) != null) {
        				interanlQueue.offer(call);
        			}
        			this.internalQueue = internalQueue;
        		} finally {
        			intenralQueueLock.writeLock.release();
        		}
        	}
        
        	void put(Call call) throws InterruptedException {
        		putSemaphore.aquire();
        		interalQueueLock.readLock().lock();
        		try {
        			internalQueue.offer(call);
        		} finally {
        			internalQueueLock.readLock().release();
        		}
        		sizeRef.incrementAndGet();
        		takeSemaphore.release();
        	}
        
        	Call take() throws InterruptedException {
        		Call result;
        		takeSemaphore.aquire();
        		interalQueueLock.readLock().lock();
        		try {
        			result = internalQueue.poll();
        		} finally {
        			internalQueueLock.readLock().release();
        		}
        		sizeRef.decrementAndGet();
        		putSemaphore.release();
        		return result;
        	}
        		
        	int size() {
        		return sizeRef.get();
        	}
        }
        
        class InternalQueueSimpleImpl implements InternalQueue {
        	final ConcurrentLinkedQueue queue = new ...
        	@Override Call poll() { return queue.poll(); }
        	@Override void offer(Call call) { queue.offer(call); }
        }
        
        Show
        ikeda Hiroshi Ikeda added a comment - Added here my image before I forget: class CallQueue { final AtomicInteger sizeRef = new ... final Semaphore putSemaphore; final Semaphore takeSemaphore = new Semaphore(0); final ReadWriteLock internalQueueLock = new ... /** The reference is guarded by internalQueueLock. */ InternalQueue internalQueue; CallQueue( int maxQueueSize, IntenralQueue initInternalQueue) { putSemaphore = new Semaphore(maxQueueSize); intenralQueue = initInteranlQueue; } /** Implementation must be thread safe. */ interface InternalQueue { /** Returns null if no element. */ Call poll(); void offer(Call); } void replaceInternalQueue(InternalQueue internalQueue) { internalQueueLock.writeLock().lock(); try { Call call; while ((call = this .internalQueue.poll()) != null ) { interanlQueue.offer(call); } this .internalQueue = internalQueue; } finally { intenralQueueLock.writeLock.release(); } } void put(Call call) throws InterruptedException { putSemaphore.aquire(); interalQueueLock.readLock().lock(); try { internalQueue.offer(call); } finally { internalQueueLock.readLock().release(); } sizeRef.incrementAndGet(); takeSemaphore.release(); } Call take() throws InterruptedException { Call result; takeSemaphore.aquire(); interalQueueLock.readLock().lock(); try { result = internalQueue.poll(); } finally { internalQueueLock.readLock().release(); } sizeRef.decrementAndGet(); putSemaphore.release(); return result; } int size() { return sizeRef.get(); } } class InternalQueueSimpleImpl implements InternalQueue { final ConcurrentLinkedQueue queue = new ... @Override Call poll() { return queue.poll(); } @Override void offer(Call call) { queue.offer(call); } }
        Hide
        ikeda Hiroshi Ikeda added a comment -

        AtomicReference to keep a reference of the queue seems inappropriate. Whenever you get a queue from the AtomicReference, there always exists the possibility that the queue is already obsolete before any action for the queue, unless there is some additional synchronization or something.

        Show
        ikeda Hiroshi Ikeda added a comment - AtomicReference to keep a reference of the queue seems inappropriate. Whenever you get a queue from the AtomicReference, there always exists the possibility that the queue is already obsolete before any action for the queue, unless there is some additional synchronization or something.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12628884/HADOOP-10278-atomicref-adapter.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3577//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3577//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3577//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628884/HADOOP-10278-atomicref-adapter.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3577//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3577//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3577//console This message is automatically generated.
        Hide
        daryn Daryn Sharp added a comment -

        I think the use of an atomic reference is more desirable than additional locking. Queue swapping will either occur rarely or perhaps never, so it's unreasonable to impact normal operation. I believe handling the race during swap is quite manageable. In case I've overlooked something, let's walk through the logic.

        I'll refer to tight race as what occurs when the atomic ref is swapped. A thread may get the pre-swap value just as it's being swapped, then operate on it.

        handlers/consumers
        During the swap, handlers might already be blocked on an already empty queue or block during the tight race. To solve that, using q.poll instead of q.take will cause the handlers to timeout and switch over to the new queue. Handlers that consume 1 more call from the old queue during the tight race are fine.

        readers/producers
        I'm not sure the readers need to use q.offer instead of q.put? If the reader is blocked on a put then the queue being swapped out is already full. When the old queue is "drained", these blocked readers' puts will immediately unblock and succeed into the old queue. At most 1 call per reader will be added to the queue post-swap. Likewise during the tight race, some readers may put at most 1 call into the old queue. I believe this is manageable:

        swapping queues
        The thread that swaps the queues already needs to drain the old queue into the new queue. This thread will race with readers that might insert 1 more call during the tight race. A drain using poll with a couple second timeout until null is returned should catch those readers that might insert 1 more call.

        The logic that attempts to fallback to the old queue probably isn't required. The thread swapping should just block until it adds all calls to the new queue. Losing or dropping calls under any condition is not desirable. A client may be left waiting indefinitely for the lost call's response.

        Show
        daryn Daryn Sharp added a comment - I think the use of an atomic reference is more desirable than additional locking. Queue swapping will either occur rarely or perhaps never, so it's unreasonable to impact normal operation. I believe handling the race during swap is quite manageable. In case I've overlooked something, let's walk through the logic. I'll refer to tight race as what occurs when the atomic ref is swapped. A thread may get the pre-swap value just as it's being swapped, then operate on it. handlers/consumers During the swap, handlers might already be blocked on an already empty queue or block during the tight race. To solve that, using q.poll instead of q.take will cause the handlers to timeout and switch over to the new queue. Handlers that consume 1 more call from the old queue during the tight race are fine. readers/producers I'm not sure the readers need to use q.offer instead of q.put ? If the reader is blocked on a put then the queue being swapped out is already full. When the old queue is "drained", these blocked readers' puts will immediately unblock and succeed into the old queue. At most 1 call per reader will be added to the queue post-swap. Likewise during the tight race, some readers may put at most 1 call into the old queue. I believe this is manageable: swapping queues The thread that swaps the queues already needs to drain the old queue into the new queue. This thread will race with readers that might insert 1 more call during the tight race. A drain using poll with a couple second timeout until null is returned should catch those readers that might insert 1 more call. The logic that attempts to fallback to the old queue probably isn't required. The thread swapping should just block until it adds all calls to the new queue. Losing or dropping calls under any condition is not desirable. A client may be left waiting indefinitely for the lost call's response.
        Hide
        chrilisf Chris Li added a comment -

        I'm going to post an updated patch with tests and Daryn Sharp's suggested changes to offer shortly, but I wanted to comment on a potential issue with queue swapping.

        The logic that attempts to fallback to the old queue probably isn't required. The thread swapping should just block until it adds all calls to the new queue. Losing or dropping calls under any condition is not desirable. A client may be left waiting indefinitely for the lost call's response.

        Currently queue swapping can fail if the new queue raises an exception or returns false in offer(). When this happens, the old queue is reverted as the active queue, and calls are drained back from new queue to old queue. This behavior was included to protect against the user accidentally misconfiguring the queue, such as going from a queue size of 10k to 10.

        An issue arises when production > consumption:
        1. because we are using poll(timeout) and requests come in faster than we can poll them out, poll(timeout) will always return a non-null result.
        2. handlers will switch to the new queue. They will either:
        3a. not be able to keep up, as before, so the new queue will fill up to capacity
        3b. be able to keep up, so the swapping will continue indefinitely, holding the Server's intrinsic lock forever

        In the case of 3a,
        1. if we revert the swap, we will probably lose calls
        2. if we block and just keep going, we will be swapping forever

        Show
        chrilisf Chris Li added a comment - I'm going to post an updated patch with tests and Daryn Sharp 's suggested changes to offer shortly, but I wanted to comment on a potential issue with queue swapping. The logic that attempts to fallback to the old queue probably isn't required. The thread swapping should just block until it adds all calls to the new queue. Losing or dropping calls under any condition is not desirable. A client may be left waiting indefinitely for the lost call's response. Currently queue swapping can fail if the new queue raises an exception or returns false in offer(). When this happens, the old queue is reverted as the active queue, and calls are drained back from new queue to old queue. This behavior was included to protect against the user accidentally misconfiguring the queue, such as going from a queue size of 10k to 10. An issue arises when production > consumption: 1. because we are using poll(timeout) and requests come in faster than we can poll them out, poll(timeout) will always return a non-null result. 2. handlers will switch to the new queue. They will either: 3a. not be able to keep up, as before, so the new queue will fill up to capacity 3b. be able to keep up, so the swapping will continue indefinitely, holding the Server's intrinsic lock forever In the case of 3a, 1. if we revert the swap, we will probably lose calls 2. if we block and just keep going, we will be swapping forever
        Hide
        chrilisf Chris Li added a comment -

        Actually I've made a mistake in my above diagnosis: the production will move over to the new queue so the swap will not be infinite!

        Show
        chrilisf Chris Li added a comment - Actually I've made a mistake in my above diagnosis: the production will move over to the new queue so the swap will not be infinite!
        Hide
        chrilisf Chris Li added a comment -

        Uploaded new version of atomicref-adapter.patch

        This version makes the changes Daryn Sharp proposed.

        Other changes:

        • renames CallQueueAdapter to CallQueueManager (which still isn't the greatest name, but hints about its ability to swap its backing queue)
        • added tests
        • try instantiation using different constructors, preferring to use the custom one
        Show
        chrilisf Chris Li added a comment - Uploaded new version of atomicref-adapter.patch This version makes the changes Daryn Sharp proposed. Other changes: renames CallQueueAdapter to CallQueueManager (which still isn't the greatest name, but hints about its ability to swap its backing queue) added tests try instantiation using different constructors, preferring to use the custom one
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12629198/HADOOP-10278-atomicref-adapter.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3583//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3583//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3583//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629198/HADOOP-10278-atomicref-adapter.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3583//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/3583//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3583//console This message is automatically generated.
        Hide
        ikeda Hiroshi Ikeda added a comment -

        Although ReadWriteLock is much more complex than simple locks and is not prefered to enclose trival logic, I suspect it is not severe overhead compared to LinkedBlockingQueue.

        Anybody seems not to worry about using LinkedBlockingQueue, but LinkedBlockingQueue separates its lock into only two locks, put side and get side, so that congesion in each side causes context switches. Moreover the separation of the lock is ineffective when there are almost always empty in the queue, and collision between put and get causes context switches.

        By the way, about the patch, nobody can eliminate the possibility that a blocked thread at LinkedBlockingQueue.put() cannot wake up in 1 seconds when another thread drains. At least you should check the reference after put(), such as

        void put(e) {
          q = queue.get();
          q.put(e);
          while((q2 = queue.get()) != q) {
            drain(q, q2);
            q = q2;
          }
        }
        

        (I also worry about the order of elements is not preserved in spite of the name "queue".)

        Still, implementation of size() is invalid.

        The atomic reference is not needed for the queue and volatile is enough if you only set and get. Volatile variables may have much chance to be optimized by VM because volatile is within the language specification. On the other hand, increment operation (++) is not atomic for volatile variables, so some of the test classes should be changed.

        Show
        ikeda Hiroshi Ikeda added a comment - Although ReadWriteLock is much more complex than simple locks and is not prefered to enclose trival logic, I suspect it is not severe overhead compared to LinkedBlockingQueue. Anybody seems not to worry about using LinkedBlockingQueue, but LinkedBlockingQueue separates its lock into only two locks, put side and get side, so that congesion in each side causes context switches. Moreover the separation of the lock is ineffective when there are almost always empty in the queue, and collision between put and get causes context switches. By the way, about the patch, nobody can eliminate the possibility that a blocked thread at LinkedBlockingQueue.put() cannot wake up in 1 seconds when another thread drains. At least you should check the reference after put(), such as void put(e) { q = queue.get(); q.put(e); while ((q2 = queue.get()) != q) { drain(q, q2); q = q2; } } (I also worry about the order of elements is not preserved in spite of the name "queue".) Still, implementation of size() is invalid. The atomic reference is not needed for the queue and volatile is enough if you only set and get. Volatile variables may have much chance to be optimized by VM because volatile is within the language specification. On the other hand, increment operation (++) is not atomic for volatile variables, so some of the test classes should be changed.
        Hide
        chrilisf Chris Li added a comment -

        Hiroshi Ikeda Thanks for taking a look.

        Although ReadWriteLock is much more complex than simple locks and is not prefered to enclose trival logic, I suspect it is not severe overhead compared to LinkedBlockingQueue.

        One of the issues with RWLock I found was that we'd have to use offer(timeout) instead of put(), or else the readlock may not be released. This makes the queue swap lock for about a second, and causes a huge backup in the process. Otherwise the overhead of the lock wasn't much, it's mainly the offer(timeout).

        By the way, about the patch, nobody can eliminate the possibility that a blocked thread at LinkedBlockingQueue.put() cannot wake up in 1 seconds when another thread drains. At least you should check the reference after put(), such as

        We could increase the timeout period on the queue swap transfer too.

        (I also worry about the order of elements is not preserved in spite of the name "queue".)

        Order doesn't matter to the client, since the client cannot expect to send two consecutive commands without receiving a response from the first, and expect order that they're executed in order, even today. This is why we can even do QoS by re-ordering rpc calls in the first place.

        As far as the server goes, I guess this will have to be something we're aware of.

        Still, implementation of size() is invalid.

        Are you referring to CallQueueManager.size(), due to the possibility that the queue is being swapped? I'm assuming that size() isn't being used for anything besides metrics, so a small discrepancy during swapping would be okay.

        The atomic reference is not needed for the queue and volatile is enough if you only set and get. Volatile variables may have much chance to be optimized by VM because volatile is within the language specification. On the other hand, increment operation (++) is not atomic for volatile variables, so some of the test classes should be changed.

        Agreed, though I think Daryn Sharp says he prefers atomicref for clarity. I could go either way.

        Also, in the tests, the atomic members aren't being written by multiple threads–they're atomic so that I can read them from another thread without thread caching. Even if they incremented atomically, there's still the issue of the put happening before the increment, so the tests only check after the operations complete (using Thread.sleeps)

        Show
        chrilisf Chris Li added a comment - Hiroshi Ikeda Thanks for taking a look. Although ReadWriteLock is much more complex than simple locks and is not prefered to enclose trival logic, I suspect it is not severe overhead compared to LinkedBlockingQueue. One of the issues with RWLock I found was that we'd have to use offer(timeout) instead of put(), or else the readlock may not be released. This makes the queue swap lock for about a second, and causes a huge backup in the process. Otherwise the overhead of the lock wasn't much, it's mainly the offer(timeout). By the way, about the patch, nobody can eliminate the possibility that a blocked thread at LinkedBlockingQueue.put() cannot wake up in 1 seconds when another thread drains. At least you should check the reference after put(), such as We could increase the timeout period on the queue swap transfer too. (I also worry about the order of elements is not preserved in spite of the name "queue".) Order doesn't matter to the client, since the client cannot expect to send two consecutive commands without receiving a response from the first, and expect order that they're executed in order, even today. This is why we can even do QoS by re-ordering rpc calls in the first place. As far as the server goes, I guess this will have to be something we're aware of. Still, implementation of size() is invalid. Are you referring to CallQueueManager.size(), due to the possibility that the queue is being swapped? I'm assuming that size() isn't being used for anything besides metrics, so a small discrepancy during swapping would be okay. The atomic reference is not needed for the queue and volatile is enough if you only set and get. Volatile variables may have much chance to be optimized by VM because volatile is within the language specification. On the other hand, increment operation (++) is not atomic for volatile variables, so some of the test classes should be changed. Agreed, though I think Daryn Sharp says he prefers atomicref for clarity. I could go either way. Also, in the tests, the atomic members aren't being written by multiple threads–they're atomic so that I can read them from another thread without thread caching. Even if they incremented atomically, there's still the issue of the put happening before the increment, so the tests only check after the operations complete (using Thread.sleeps)
        Hide
        daryn Daryn Sharp added a comment -

        Up front, my requirement is I don't want a feature that will be rarely used to add additional lock contention during normal operation. I'm -1 to more locking.

        LinkedBlockingQueue separates its lock into only two locks, put side and get side, so that congesion in each side causes context switches. Moreover the separation of the lock is ineffective when there are almost always empty in the queue, and collision between put and get causes context switches.

        I'm not sure I understand. Context switches are inevitable if the queue is empty. Under normal conditions the put/take lock reduces contention.

        An issue arises when production > consumption:

        I thought about that case with the original analysis but I didn't want the post to be too big. If the NN is under heavy load, the callq is full, and the user uses a new queue is of a much smaller size - I'd argue that's a case of user error. The draining thread will compete with new calls but should eventually complete.

        The atomic reference is not needed for the queue and volatile is enough if you only set and get. Volatile variables may have much chance to be optimized by VM because volatile is within the language specification.

        AtomicRef is just a wrapper around a volatile (by nature can't be optimized) with CAS methods. I like the fact AtomicRef provides an obvious hint that the value is going to be changed, plus we should be using CAS to swap the queue ref to prevent races.

        So how about something like this rough idea:

        1. CallQueueManager (or whatever it's called) maintains a putRef and takeRef, initially set to the same queue
        2. for a swap, use CAS to update the putRef with the new callq so readers begin filling it - note it's not yet being serviced by handlers
        3. now wait for handlers to drain all the calls in takeRef's old callq (1) - periodically check isEmpty()?
        4. finally update takeRef to point to the new callq

        (1)There's a race where readers may drop in 1 more call after the callq swapped. Using isEmpty() is imperfect due to the race. I dislike time-based solutions, but if handlers are polling for a few seconds, that should be plenty of time to get straggler calls before the handlers switch to the new/current queue.

        Show
        daryn Daryn Sharp added a comment - Up front, my requirement is I don't want a feature that will be rarely used to add additional lock contention during normal operation. I'm -1 to more locking. LinkedBlockingQueue separates its lock into only two locks, put side and get side, so that congesion in each side causes context switches. Moreover the separation of the lock is ineffective when there are almost always empty in the queue, and collision between put and get causes context switches. I'm not sure I understand. Context switches are inevitable if the queue is empty. Under normal conditions the put/take lock reduces contention. An issue arises when production > consumption: I thought about that case with the original analysis but I didn't want the post to be too big. If the NN is under heavy load, the callq is full, and the user uses a new queue is of a much smaller size - I'd argue that's a case of user error. The draining thread will compete with new calls but should eventually complete. The atomic reference is not needed for the queue and volatile is enough if you only set and get. Volatile variables may have much chance to be optimized by VM because volatile is within the language specification. AtomicRef is just a wrapper around a volatile (by nature can't be optimized) with CAS methods. I like the fact AtomicRef provides an obvious hint that the value is going to be changed, plus we should be using CAS to swap the queue ref to prevent races. — So how about something like this rough idea: CallQueueManager (or whatever it's called) maintains a putRef and takeRef, initially set to the same queue for a swap, use CAS to update the putRef with the new callq so readers begin filling it - note it's not yet being serviced by handlers now wait for handlers to drain all the calls in takeRef's old callq (1) - periodically check isEmpty()? finally update takeRef to point to the new callq (1)There's a race where readers may drop in 1 more call after the callq swapped. Using isEmpty() is imperfect due to the race. I dislike time-based solutions, but if handlers are polling for a few seconds, that should be plenty of time to get straggler calls before the handlers switch to the new/current queue.
        Hide
        chrilisf Chris Li added a comment -

        plus we should be using CAS to swap the queue ref to prevent races

        Just for clarity: currently CallQueueManager.swapQueue isn't threadsafe, since the Server's calling method is synchronized. Are we trying to make swapQueue threadsafe, or are we trying to be even safer to make sure puts don't get left behind? If it's the later only, we can continue to use set() instead of CAS. Making swapQueue concurrent sounds like more hassle than it's worth.

        Show
        chrilisf Chris Li added a comment - plus we should be using CAS to swap the queue ref to prevent races Just for clarity: currently CallQueueManager.swapQueue isn't threadsafe, since the Server's calling method is synchronized. Are we trying to make swapQueue threadsafe, or are we trying to be even safer to make sure puts don't get left behind? If it's the later only, we can continue to use set() instead of CAS. Making swapQueue concurrent sounds like more hassle than it's worth.
        Hide
        chrilisf Chris Li added a comment -

        Latest version of atomicref-adapter swaps by using handlers to clear the calls, choreographed by using two refs for put and take.

        We use a software version of double clocking to ensure the queue is likely empty. This should decrease the probability of dropping calls to highly unlikely. And losing calls isn't the end of the world either, since the client handles IPC timeouts with retries.

        Tests updated too, since the queue can only be swapped when there are active readers.

        Here's what a live swap looks like:

        Show
        chrilisf Chris Li added a comment - Latest version of atomicref-adapter swaps by using handlers to clear the calls, choreographed by using two refs for put and take. We use a software version of double clocking to ensure the queue is likely empty. This should decrease the probability of dropping calls to highly unlikely. And losing calls isn't the end of the world either, since the client handles IPC timeouts with retries. Tests updated too, since the queue can only be swapped when there are active readers. Here's what a live swap looks like:
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12629661/HADOOP-10278-atomicref-adapter.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

        org.apache.hadoop.ipc.TestCallQueueManager

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3588//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3588//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629661/HADOOP-10278-atomicref-adapter.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ipc.TestCallQueueManager +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3588//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3588//console This message is automatically generated.
        Hide
        chrilisf Chris Li added a comment -

        This should fix the test (HADOOP-10278-atomicref-adapter.patch)

        Show
        chrilisf Chris Li added a comment - This should fix the test ( HADOOP-10278 -atomicref-adapter.patch)
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12629682/HADOOP-10278-atomicref-adapter.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3589//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3589//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629682/HADOOP-10278-atomicref-adapter.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3589//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3589//console This message is automatically generated.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Hi Chris, Who calls refreshCallQueue?

        Show
        arpitagarwal Arpit Agarwal added a comment - Hi Chris, Who calls refreshCallQueue ?
        Hide
        chrilisf Chris Li added a comment -

        Hi Arpit Agarwal, check out HADOOP-10285

        It's called from the command line: hadoop dfsadmin -refreshCallQueue

        Show
        chrilisf Chris Li added a comment - Hi Arpit Agarwal , check out HADOOP-10285 It's called from the command line: hadoop dfsadmin -refreshCallQueue
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Thanks Chris. Couple of comments and I apologize if these have been discussed already and I missed it.

        Overall the patch looks good. Would like to understand if there are any alternative use cases.

        1. Is this the only use case (admin changes the call queue at runtime for a specific port)? Would it be useful to have a default global setting to refresh all ports without specifying the port number?
        2. Will it be made available for other daemons apart from NN?

        Also in refreshCallQueue perhaps you can skip the call to swapQueue if queueClassToUse has not changed.

        Show
        arpitagarwal Arpit Agarwal added a comment - Thanks Chris. Couple of comments and I apologize if these have been discussed already and I missed it. Overall the patch looks good. Would like to understand if there are any alternative use cases. Is this the only use case (admin changes the call queue at runtime for a specific port)? Would it be useful to have a default global setting to refresh all ports without specifying the port number? Will it be made available for other daemons apart from NN? Also in refreshCallQueue perhaps you can skip the call to swapQueue if queueClassToUse has not changed.
        Hide
        chrilisf Chris Li added a comment -

        Thanks for checking out the patch, Arapit

        Actually, the admin can only refresh the callQueue on the namenode today. This is the only case I've targeted, since the namenode can't be easily restarted.

        Thinking of what daemons might benefit:

        • Namenode is what swapping was made for, since the NN can't be restarted to refresh the queue so easily
        • Datanodes, nodemanagers can be restarted at will, so it's not as useful
        • Resourcemanager maybe, but I'm not sure whether QoS on the RM solves a problem yet

        Also, we allow swaps between the same queue, since the user may adjust features of the queue (today that might be queue size, later on it would be qos parameters) at runtime.

        Show
        chrilisf Chris Li added a comment - Thanks for checking out the patch, Arapit Actually, the admin can only refresh the callQueue on the namenode today. This is the only case I've targeted, since the namenode can't be easily restarted. Thinking of what daemons might benefit: Namenode is what swapping was made for, since the NN can't be restarted to refresh the queue so easily Datanodes, nodemanagers can be restarted at will, so it's not as useful Resourcemanager maybe, but I'm not sure whether QoS on the RM solves a problem yet Also, we allow swaps between the same queue, since the user may adjust features of the queue (today that might be queue size, later on it would be qos parameters) at runtime.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Thanks for the clarification.

        I am +1 for the patch. Daryn Sharp and Jing Zhao were looking at this too so I will hold off on committing in case they have comments.

        Show
        arpitagarwal Arpit Agarwal added a comment - Thanks for the clarification. I am +1 for the patch. Daryn Sharp and Jing Zhao were looking at this too so I will hold off on committing in case they have comments.
        Hide
        ikeda Hiroshi Ikeda added a comment -

        Chris Li What do you think about implementing my sketch previously I wrote and comparing results by way of experiment, if you have time? I believe using semaphores + read-write lock + a concurrent queue is as good as (or better than) using an atomic reference + a blocking queue, especially with multiple threads which have possibility to collide.

        Also, in the tests, the atomic members aren't being written by multiple threads–they're atomic so that I can read them from another thread without thread caching.

        I understand. I think adding some comment would be helpful because it is open to misunderstand at first glance.

        Daryn Sharp

        I'm not sure I understand. Context switches are inevitable if the queue is empty. Under normal conditions the put/take lock reduces contention.

        I think LinkedBlockingQueue has more chances to block threads than you except. If multiple threads call at once LinkedBlockingQueue.put()/offer(), only one thread wins the put-side lock and the others are blocked causing context switches. The same goes for LinkedBlockingQueue.take()/poll(). Also, when a thread puts a element into an empty queue, it finally should get its take-side lock in order to send a signal to take-side, but it may be blocked because another thread is just taking an element (with non-trivial logic) within the take-side lock. The same goes for taking an element from a full queue.

        A context switch is quite heavy overhead so that it is not entirely off the mark that you can create 100 objects at the same cost. Reducing critical sections is one of the most important things to improve performance in multi-thread contexts, and AtomicReference etc. are well used to create non-blocking logic. So, I feel using AtomicReference instead of ReadWriteLock but sticking to LinkedBlockingQueue is not consistent. It makes even less sense that it requires bothersome swapping logic.

        AtomicRef is just a wrapper around a volatile (by nature can't be optimized) with CAS methods.

        I agree AtomicReference is a wrapper in the Oracle (Sun) implementation, which internally uses sun.misc.Unsafe, and I think it depends on venders. Also I think VM can optimize volatile variables, similarly to native synchronizations (for example, eliminating unnecessary synchronizations with escape analysis), though I'm not sure for our context.

        Show
        ikeda Hiroshi Ikeda added a comment - Chris Li What do you think about implementing my sketch previously I wrote and comparing results by way of experiment, if you have time? I believe using semaphores + read-write lock + a concurrent queue is as good as (or better than) using an atomic reference + a blocking queue, especially with multiple threads which have possibility to collide. Also, in the tests, the atomic members aren't being written by multiple threads–they're atomic so that I can read them from another thread without thread caching. I understand. I think adding some comment would be helpful because it is open to misunderstand at first glance. Daryn Sharp I'm not sure I understand. Context switches are inevitable if the queue is empty. Under normal conditions the put/take lock reduces contention. I think LinkedBlockingQueue has more chances to block threads than you except. If multiple threads call at once LinkedBlockingQueue.put()/offer(), only one thread wins the put-side lock and the others are blocked causing context switches. The same goes for LinkedBlockingQueue.take()/poll(). Also, when a thread puts a element into an empty queue, it finally should get its take-side lock in order to send a signal to take-side, but it may be blocked because another thread is just taking an element (with non-trivial logic) within the take-side lock. The same goes for taking an element from a full queue. A context switch is quite heavy overhead so that it is not entirely off the mark that you can create 100 objects at the same cost. Reducing critical sections is one of the most important things to improve performance in multi-thread contexts, and AtomicReference etc. are well used to create non-blocking logic. So, I feel using AtomicReference instead of ReadWriteLock but sticking to LinkedBlockingQueue is not consistent. It makes even less sense that it requires bothersome swapping logic. AtomicRef is just a wrapper around a volatile (by nature can't be optimized) with CAS methods. I agree AtomicReference is a wrapper in the Oracle (Sun) implementation, which internally uses sun.misc.Unsafe, and I think it depends on venders. Also I think VM can optimize volatile variables, similarly to native synchronizations (for example, eliminating unnecessary synchronizations with escape analysis), though I'm not sure for our context.
        Hide
        chrilisf Chris Li added a comment -

        Hey Hiroshi Ikeda, the semaphore solution looks similar to my initial implementation, which used two re-entrant locks to handle blocking for a concurrent linked queue safely through their condition vars. Performance was not impacted either (in fact, performance was slightly better than java's LBQ in the RPC call benchmark).

        At the end of the day it added too much complexity in the way of code, so we dropped it for Daryn's suggestion. I suspect the semaphore solution would be the same way: it would make the code too complex for comfort.

        I can implement and test it, but I'd like to get Daryn Sharp's input before I go off to build another big feature, since it would be a waste if it still doesn't solve the problem of code complexity.

        Show
        chrilisf Chris Li added a comment - Hey Hiroshi Ikeda , the semaphore solution looks similar to my initial implementation, which used two re-entrant locks to handle blocking for a concurrent linked queue safely through their condition vars. Performance was not impacted either (in fact, performance was slightly better than java's LBQ in the RPC call benchmark). At the end of the day it added too much complexity in the way of code, so we dropped it for Daryn's suggestion. I suspect the semaphore solution would be the same way: it would make the code too complex for comfort. I can implement and test it, but I'd like to get Daryn Sharp 's input before I go off to build another big feature, since it would be a waste if it still doesn't solve the problem of code complexity.
        Hide
        ikeda Hiroshi Ikeda added a comment -

        Chris Li Quite different. A reentrant lock blocks other threads, and executing non-trivial tasks, such as accessing a queue, within the lock increases chance of the blocks. That seems logically similar to BlockingLinkedQueue.

        I think my sketch is simple and concrete enough, but anyway it is possible that the performance would not impact because of other factors.

        Show
        ikeda Hiroshi Ikeda added a comment - Chris Li Quite different. A reentrant lock blocks other threads, and executing non-trivial tasks, such as accessing a queue, within the lock increases chance of the blocks. That seems logically similar to BlockingLinkedQueue. I think my sketch is simple and concrete enough, but anyway it is possible that the performance would not impact because of other factors.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        The latest patch looks good for a first cut implementation. If there are no objections by tonight PST I will commit it.

        Show
        arpitagarwal Arpit Agarwal added a comment - The latest patch looks good for a first cut implementation. If there are no objections by tonight PST I will commit it.
        Hide
        chrilisf Chris Li added a comment -

        Hiroshi Ikeda I think that could be explored at a later time; for right now I think it's good enough that we don't degrade performance rather than seeking to improve it.

        Arpit Agarwal let me know if there's anything else you need before committing.

        Show
        chrilisf Chris Li added a comment - Hiroshi Ikeda I think that could be explored at a later time; for right now I think it's good enough that we don't degrade performance rather than seeking to improve it. Arpit Agarwal let me know if there's anything else you need before committing.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        I committed to trunk and branch-2.

        Thanks for the contribution Chris Li and thanks Hiroshi, Benoy, Jing and Daryn for reviewing.

        Show
        arpitagarwal Arpit Agarwal added a comment - I committed to trunk and branch-2. Thanks for the contribution Chris Li and thanks Hiroshi, Benoy, Jing and Daryn for reviewing.
        Hide
        chrili Chris Li added a comment -

        Awesome, thanks all!

        The next patch in the series is https://issues.apache.org/jira/browse/HADOOP-10285

        It adds the admin interface to trigger a queue swap at runtime. Further discussion can take place there. Thanks

        Show
        chrili Chris Li added a comment - Awesome, thanks all! The next patch in the series is https://issues.apache.org/jira/browse/HADOOP-10285 It adds the admin interface to trigger a queue swap at runtime. Further discussion can take place there. Thanks
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #5208 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5208/)
        HADOOP-10278. Refactor to make CallQueue pluggable. (Contributed by Chris Li) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570703)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallQueueManager.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestCallQueueManager.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5208 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5208/ ) HADOOP-10278 . Refactor to make CallQueue pluggable. (Contributed by Chris Li) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570703 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallQueueManager.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestCallQueueManager.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Yarn-trunk #489 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/489/)
        HADOOP-10278. Refactor to make CallQueue pluggable. (Contributed by Chris Li) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570703)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallQueueManager.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestCallQueueManager.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #489 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/489/ ) HADOOP-10278 . Refactor to make CallQueue pluggable. (Contributed by Chris Li) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570703 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallQueueManager.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestCallQueueManager.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #1681 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1681/)
        HADOOP-10278. Refactor to make CallQueue pluggable. (Contributed by Chris Li) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570703)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallQueueManager.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestCallQueueManager.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1681 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1681/ ) HADOOP-10278 . Refactor to make CallQueue pluggable. (Contributed by Chris Li) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570703 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallQueueManager.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestCallQueueManager.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1706 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1706/)
        HADOOP-10278. Refactor to make CallQueue pluggable. (Contributed by Chris Li) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570703)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallQueueManager.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestCallQueueManager.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1706 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1706/ ) HADOOP-10278 . Refactor to make CallQueue pluggable. (Contributed by Chris Li) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570703 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallQueueManager.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestCallQueueManager.java
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        I merged this to branch-2.4.

        Show
        arpitagarwal Arpit Agarwal added a comment - I merged this to branch-2.4.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #5248 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5248/)
        Update target version to 2.4.0 for HADOOP-10278 and HADOOP-10285 in CHANGES.txt (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1573070)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5248 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5248/ ) Update target version to 2.4.0 for HADOOP-10278 and HADOOP-10285 in CHANGES.txt (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1573070 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Yarn-trunk #496 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/496/)
        Update target version to 2.4.0 for HADOOP-10278 and HADOOP-10285 in CHANGES.txt (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1573070)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #496 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/496/ ) Update target version to 2.4.0 for HADOOP-10278 and HADOOP-10285 in CHANGES.txt (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1573070 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #1688 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1688/)
        Update target version to 2.4.0 for HADOOP-10278 and HADOOP-10285 in CHANGES.txt (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1573070)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1688 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1688/ ) Update target version to 2.4.0 for HADOOP-10278 and HADOOP-10285 in CHANGES.txt (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1573070 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1713 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1713/)
        Update target version to 2.4.0 for HADOOP-10278 and HADOOP-10285 in CHANGES.txt (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1573070)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1713 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1713/ ) Update target version to 2.4.0 for HADOOP-10278 and HADOOP-10285 in CHANGES.txt (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1573070 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt

          People

          • Assignee:
            chrilisf Chris Li
            Reporter:
            chrilisf Chris Li
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development