HBase
  1. HBase
  2. HBASE-10015

Replace intrinsic locking with explicit locks in StoreScanner

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.98.0, 0.96.1, 0.94.15
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      Did some more profiling (this time with a sampling profiler) and StoreScanner.peek() showed up a lot in the samples. At first that was surprising, but peek is synchronized, so it seems a lot of the sync'ing cost is eaten there.
      It seems the only reason we have to synchronize all these methods is because a concurrent flush or compaction can change the scanner stack, other than that only a single thread should access a StoreScanner at any given time.
      So replaced updateReaders() with some code that just indicates to the scanner that the readers should be updated and then make it the using thread's responsibility to do the work.
      The perf improvement from this is staggering. I am seeing somewhere around 3x scan performance improvement across all scenarios.

      Now, the hard part is to reason about whether this is 100% correct. I ran TestAtomicOperation and TestAcidGuarantees a few times in a loop, all still pass.

      Will attach a sample patch.

      1. 10015-0.94.txt
        5 kB
        Lars Hofhansl
      2. TestLoad.java
        4 kB
        Lars Hofhansl
      3. 10015-0.94-withtest.txt
        8 kB
        Lars Hofhansl
      4. 10015-trunk.txt
        5 kB
        Lars Hofhansl
      5. 10015-0.94-v2.txt
        6 kB
        Lars Hofhansl
      6. 10015-trunk-v2.txt
        5 kB
        Lars Hofhansl
      7. 10015-0.94-v3.txt
        4 kB
        Lars Hofhansl
      8. 10015-trunk-v3.txt
        4 kB
        Lars Hofhansl
      9. 10015-0.94-v4.txt
        4 kB
        Lars Hofhansl
      10. 10015-trunk-v4.txt
        4 kB
        Lars Hofhansl
      11. 10015-trunk-v4.txt
        4 kB
        stack
      12. 10015-trunk-v4.txt
        4 kB
        stack
      13. 10015-0.94-new-sample.txt
        9 kB
        Lars Hofhansl
      14. 10015-0.94-lock.txt
        4 kB
        Lars Hofhansl
      15. 10015-trunk-lock.txt
        4 kB
        Lars Hofhansl

        Activity

        Hide
        stack added a comment -

        Released in 0.96.1. Issue closed.

        Show
        stack added a comment - Released in 0.96.1. Issue closed.
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK #4700 (See https://builds.apache.org/job/HBase-TRUNK/4700/)
        HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545838)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4700 (See https://builds.apache.org/job/HBase-TRUNK/4700/ ) HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545838) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in hbase-0.96 #205 (See https://builds.apache.org/job/hbase-0.96/205/)
        HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545837)

        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Show
        Hudson added a comment - SUCCESS: Integrated in hbase-0.96 #205 (See https://builds.apache.org/job/hbase-0.96/205/ ) HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545837) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #853 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/853/)
        HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545838)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #853 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/853/ ) HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545838) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96-hadoop2 #133 (See https://builds.apache.org/job/hbase-0.96-hadoop2/133/)
        HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545837)

        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #133 (See https://builds.apache.org/job/hbase-0.96-hadoop2/133/ ) HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545837) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94 #1212 (See https://builds.apache.org/job/HBase-0.94/1212/)
        HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545840)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94 #1212 (See https://builds.apache.org/job/HBase-0.94/1212/ ) HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545840) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-security #346 (See https://builds.apache.org/job/HBase-0.94-security/346/)
        HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545840)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-security #346 (See https://builds.apache.org/job/HBase-0.94-security/346/ ) HBASE-10015 Replace intrinsic locking with explicit locks in StoreScanner (larsh: rev 1545840) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        Hide
        Lars Hofhansl added a comment -

        Committed to all branches.

        Interestingly now I find that for tall tables region.getCoprocessorHost().postScannerFilterRow takes the new top spot.
        (with a sampling profiler)

        Show
        Lars Hofhansl added a comment - Committed to all branches. Interestingly now I find that for tall tables region.getCoprocessorHost().postScannerFilterRow takes the new top spot. (with a sampling profiler)
        Hide
        Lars Hofhansl added a comment -

        The javac warnings are these:

        [INFO] Compiling 150 source files to /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/target/classes
        [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[51,15] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release
        [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1110,19] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release
        [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1116,21] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release
        [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1121,28] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release
        

        These are not new.

        Same with the Javadoc warnings (no new ones from this patch).
        Same for findbugs.

        Going to commit.

        Show
        Lars Hofhansl added a comment - The javac warnings are these: [INFO] Compiling 150 source files to /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/target/classes [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[51,15] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1110,19] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1116,21] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1121,28] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release These are not new. Same with the Javadoc warnings (no new ones from this patch). Same for findbugs. Going to commit.
        Hide
        Lars Hofhansl added a comment -

        Will check the various new warnings.

        Show
        Lars Hofhansl added a comment - Will check the various new warnings.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12615867/10015-trunk-lock.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        -1 javac. The applied patch generated 4 javac compiler warnings (more than the trunk's current 0 warnings).

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings).

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615867/10015-trunk-lock.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. -1 javac . The applied patch generated 4 javac compiler warnings (more than the trunk's current 0 warnings). -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8000//console This message is automatically generated.
        Hide
        stack added a comment -

        +1 to 0.96 and trunk.

        Show
        stack added a comment - +1 to 0.96 and trunk.
        Hide
        Lars Hofhansl added a comment -

        Will commit if it comes through clean.
        Checked again in the sampling profiler, StoreScanner.peek() moved from 1st place to 15th or so.

        Show
        Lars Hofhansl added a comment - Will commit if it comes through clean. Checked again in the sampling profiler, StoreScanner.peek() moved from 1st place to 15th or so.
        Hide
        Lars Hofhansl added a comment -

        Trunk patch.

        Show
        Lars Hofhansl added a comment - Trunk patch.
        Hide
        Lars Hofhansl added a comment -

        The trick will be to do all this without the need to synchronize anything in StoreScanner.
        Maybe there is a way to bring my idea of lock coarsening further: Above I suggest to just have updateReaders lock on the RegionScannerImpl, because that is locked anyway. The problem was that - at least in trunk - we also want to be notified during flushes and compactions and in that case we do not have a RegionScannerImpl. So idea is: Lock the StoreScanner itself from the compaction/flush code and pass itself as the lock object. For flushes we do it in a single loop, for compaction we could do that in chunks of 10000 or so.

        Now in this issue I already attached a bunch of different patches. Lemme commit the -lock patch here and close this. We can discuss further on a new jira.

        Thanks Andrew Purtell, stack, and Ted Yu for running the perf unittests.

        Show
        Lars Hofhansl added a comment - The trick will be to do all this without the need to synchronize anything in StoreScanner. Maybe there is a way to bring my idea of lock coarsening further: Above I suggest to just have updateReaders lock on the RegionScannerImpl, because that is locked anyway. The problem was that - at least in trunk - we also want to be notified during flushes and compactions and in that case we do not have a RegionScannerImpl. So idea is: Lock the StoreScanner itself from the compaction/flush code and pass itself as the lock object. For flushes we do it in a single loop, for compaction we could do that in chunks of 10000 or so. Now in this issue I already attached a bunch of different patches. Lemme commit the -lock patch here and close this. We can discuss further on a new jira. Thanks Andrew Purtell , stack , and Ted Yu for running the perf unittests.
        Hide
        Enis Soztutar added a comment -

        oh, ok. I missed that somehow from the above discussion. Is there a jira yet? I can help with this.

        Show
        Enis Soztutar added a comment - oh, ok. I missed that somehow from the above discussion. Is there a jira yet? I can help with this.
        Hide
        Lars Hofhansl added a comment -

        Yeah, that's the idea. We'd delay archiving any HFile until no scanners are referring to it any more.

        Show
        Lars Hofhansl added a comment - Yeah, that's the idea. We'd delay archiving any HFile until no scanners are referring to it any more.
        Hide
        Enis Soztutar added a comment -

        If we do ref counting, can't we still finish the compaction, but not archive the files as long as there are scanners against them. We can do it by decoupling the store file archiving from compaction. At the worst case on RS crash, we might end up already compacted files which won't affect semantics.
        Regardless, +1 on -lock patch just for the gains.

        Show
        Enis Soztutar added a comment - If we do ref counting, can't we still finish the compaction, but not archive the files as long as there are scanners against them. We can do it by decoupling the store file archiving from compaction. At the worst case on RS crash, we might end up already compacted files which won't affect semantics. Regardless, +1 on -lock patch just for the gains.
        Hide
        Lars Hofhansl added a comment -

        I force enabled biased locking. No improvement with intrinsic locking; according to the docs it is enabled in JDK6+ anyway, was just making sure.

        Show
        Lars Hofhansl added a comment - I force enabled biased locking. No improvement with intrinsic locking; according to the docs it is enabled in JDK6+ anyway, was just making sure.
        Hide
        Ted Yu added a comment -

        +1

        Show
        Ted Yu added a comment - +1
        Hide
        Lars Hofhansl added a comment -

        10 columns, 100 byte values, 1st and 4th selected.
        5 runs, Mean: 13.756 Sigma: 0.04210938137755054, with lock patch
        5 runs, Mean: 14.2028 Sigma: 0.08452313292821084, without

        10 columns, all selected
        5 runs, Mean: 8.577 Sigma: 0.09060463564299566
        5 runs, Mean: 9.9846 Sigma: 0.1023749969474969, without

        Per KV cost now is dominant. In no scenario have I observed this to be slower.

        Show
        Lars Hofhansl added a comment - 10 columns, 100 byte values, 1st and 4th selected. 5 runs, Mean: 13.756 Sigma: 0.04210938137755054, with lock patch 5 runs, Mean: 14.2028 Sigma: 0.08452313292821084, without 10 columns, all selected 5 runs, Mean: 8.577 Sigma: 0.09060463564299566 5 runs, Mean: 9.9846 Sigma: 0.1023749969474969, without Per KV cost now is dominant. In no scenario have I observed this to be slower.
        Hide
        Lars Hofhansl added a comment -

        So it seems this is universal.
        Our performance dude has actually confirmed that this is an expected outcome. I'll gather some more CPU metrics as I find time today.

        stack, these locks are taken in StoreScanner, which is "row" based (unlike StoreFileScanner and MemstoreScanner). So the frequent locks/unlocks there we'd mostly see per row. Wider rows won't be slower, but the speedup effect would be proportionally less. To be sure I'll validate with wider tables (maybe 20 CQs). I will also double check that biased locking is in effect.

        Show
        Lars Hofhansl added a comment - So it seems this is universal. Our performance dude has actually confirmed that this is an expected outcome. I'll gather some more CPU metrics as I find time today. stack , these locks are taken in StoreScanner, which is "row" based (unlike StoreFileScanner and MemstoreScanner). So the frequent locks/unlocks there we'd mostly see per row. Wider rows won't be slower, but the speedup effect would be proportionally less. To be sure I'll validate with wider tables (maybe 20 CQs). I will also double check that biased locking is in effect.
        Hide
        Ted Yu added a comment -

        On Linux with jdk 1.7 :

        with patch:
        Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:2297.5 sigma:23.29484921608208

        without:
        Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:3242.4 sigma:732.0478399667605

        Show
        Ted Yu added a comment - On Linux with jdk 1.7 : with patch: Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:2297.5 sigma:23.29484921608208 without: Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:3242.4 sigma:732.0478399667605
        Hide
        Andrew Purtell added a comment -

        This is the same system with TestScanFilterPerformance:

        Without: 10 runs mean:2248.7 sigma:36.59795076230362

        With the 0.94 -lock patch: 10 runs mean:1860.3 sigma:29.397448868906977

        Show
        Andrew Purtell added a comment - This is the same system with TestScanFilterPerformance: Without: 10 runs mean:2248.7 sigma:36.59795076230362 With the 0.94 -lock patch: 10 runs mean:1860.3 sigma:29.397448868906977
        Hide
        Andrew Purtell added a comment - - edited

        JDK 7, 16 vcores, SSD, xfs

        Without: 5 runs, Mean: 14.4136 Sigma: 0.17013712116995514

        With the 0.94 -lock patch: 5 runs, Mean: 10.614 Sigma: 0.18904179432072687

        Edit: This is with the 'TestLoad' utility

        Show
        Andrew Purtell added a comment - - edited JDK 7, 16 vcores, SSD, xfs Without: 5 runs, Mean: 14.4136 Sigma: 0.17013712116995514 With the 0.94 -lock patch: 5 runs, Mean: 10.614 Sigma: 0.18904179432072687 Edit: This is with the 'TestLoad' utility
        Hide
        stack added a comment -

        On mac I see this w/ jvm1.8:

        Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:3032.9 sigma:60.501983438561744
        Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:2322.4 sigma:119.92430946226041

        Let me try another box. How can I be sure there is no regression for wide tables?

        Show
        stack added a comment - On mac I see this w/ jvm1.8: Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:3032.9 sigma:60.501983438561744 Failed tests: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:2322.4 sigma:119.92430946226041 Let me try another box. How can I be sure there is no regression for wide tables?
        Hide
        Lars Hofhansl added a comment -

        Verified on different machine architecture with JDK6 (using the attached unittest).
        4400ms vs 3200ms (JDK7, 2 core machine)
        2600mx vs 2000ms (JDK6, 12 core machine)

        Again, if somebody is still reading and could verify the latest patch with the earlier unittest on their hardware that would be greatly appreciated.

        Show
        Lars Hofhansl added a comment - Verified on different machine architecture with JDK6 (using the attached unittest). 4400ms vs 3200ms (JDK7, 2 core machine) 2600mx vs 2000ms (JDK6, 12 core machine) Again, if somebody is still reading and could verify the latest patch with the earlier unittest on their hardware that would be greatly appreciated.
        Hide
        Lars Hofhansl added a comment -

        Parking the ReentrantLock patch for 0.94 here.

        Show
        Lars Hofhansl added a comment - Parking the ReentrantLock patch for 0.94 here.
        Hide
        Lars Hofhansl added a comment -

        Continuing my monologue here...

        To my surprise I get a 25-30% perf improvement when I just replace intrinsic locking (synchronized) with ReentrantLock - again for very tall tables only.
        synchronized with biased locking should outperform the ReentrantLock in uncontended cases, but it does not (all my tests are against a real RegionServer, so the initial delay for BiasedLocking has not effect here).

        Something is going on.

        This is on JDK7 on old'ish 2 core machine. Will try on some other machines as well. If this bears out on other machines as well it would be safe and quick win.

        Show
        Lars Hofhansl added a comment - Continuing my monologue here... To my surprise I get a 25-30% perf improvement when I just replace intrinsic locking (synchronized) with ReentrantLock - again for very tall tables only. synchronized with biased locking should outperform the ReentrantLock in uncontended cases, but it does not (all my tests are against a real RegionServer, so the initial delay for BiasedLocking has not effect here). Something is going on. This is on JDK7 on old'ish 2 core machine. Will try on some other machines as well. If this bears out on other machines as well it would be safe and quick win.
        Hide
        Lars Hofhansl added a comment -

        Was just looking at making a trunk patch.
        In trunk StoreScanners for compaction do register a change observer? But not in 0.94?!

        So - sigh - the patch I just made for 0.94 will not work in trunk. And maybe more importantly: Is there a lingering bug in 0.94?

        Show
        Lars Hofhansl added a comment - Was just looking at making a trunk patch. In trunk StoreScanners for compaction do register a change observer? But not in 0.94?! So - sigh - the patch I just made for 0.94 will not work in trunk. And maybe more importantly: Is there a lingering bug in 0.94?
        Hide
        Lars Hofhansl added a comment -

        Here's yet another sample patch (for 0.94). Adds a setter for a syncObject to KeyValueScanner. (if I wanted to change the StoreScanner constructor then we need to change the coprocessor region observer to pass this in as well, so I opted for a setter instead). When a StoreScanner is created by a RegionScanner it passes a reference to this as the syncObject.
        So now all locking is - when needed - done through the RegionScannerImpl and I see the same performance improvement.
        As said above StoreScanner for flushes and compactions are never invalidated anyway.

        We have coarsened the lock from StoreScanner.next/peek/seek/etc to RegionScannerImpl.next/peek/seek/etc.

        Now, this is not really pretty. Is this worth the performance gain for tall table scans? Up to 2x for really tall tables with small KVs, proportionally less for wider tables and larger KVs.

        Show
        Lars Hofhansl added a comment - Here's yet another sample patch (for 0.94). Adds a setter for a syncObject to KeyValueScanner. (if I wanted to change the StoreScanner constructor then we need to change the coprocessor region observer to pass this in as well, so I opted for a setter instead). When a StoreScanner is created by a RegionScanner it passes a reference to this as the syncObject. So now all locking is - when needed - done through the RegionScannerImpl and I see the same performance improvement. As said above StoreScanner for flushes and compactions are never invalidated anyway. We have coarsened the lock from StoreScanner.next/peek/seek/etc to RegionScannerImpl.next/peek/seek/etc. Now, this is not really pretty. Is this worth the performance gain for tall table scans? Up to 2x for really tall tables with small KVs, proportionally less for wider tables and larger KVs.
        Hide
        Lars Hofhansl added a comment -

        I think so (to your first point).

        This is (almost) never contented, but StoreScanner.peek() is called very frequently (including the compares in KeyValueHeap) and the memory fences enforced by synchronized cause a slowdown.

        I did notice that during flushed and compactions we do not register any listeners for changed readers, so my earlier idea of just synchronizing on the RegionScannerImpl should work after all.

        Show
        Lars Hofhansl added a comment - I think so (to your first point). This is (almost) never contented, but StoreScanner.peek() is called very frequently (including the compares in KeyValueHeap) and the memory fences enforced by synchronized cause a slowdown. I did notice that during flushed and compactions we do not register any listeners for changed readers, so my earlier idea of just synchronizing on the RegionScannerImpl should work after all.
        Hide
        stack added a comment -

        So you are thinking this cannot be completed until after we add delayed clean up of no-longer-used files? Only then can we safely remove synchronizations?

        How many threads we talking anyways? It should be uncontended. The only thread is the current handler asking to return scan results – this changes as different handlers come in on each bulk next invocation – and then an incidental update readers request..and that is it?

        Show
        stack added a comment - So you are thinking this cannot be completed until after we add delayed clean up of no-longer-used files? Only then can we safely remove synchronizations? How many threads we talking anyways? It should be uncontended. The only thread is the current handler asking to return scan results – this changes as different handlers come in on each bulk next invocation – and then an incidental update readers request..and that is it?
        Hide
        Lars Hofhansl added a comment -

        No longer sure that patch is good. The problem was that StoreScanner.peek() is synchronized.

        With the patch peek() will use the old scanner stack, looking at MemstoreScanner and StoreFileScanner it is correct (both scanners just return the current KV and StoreFileScanner does not touch its reader during peek), but it is fragile, it also requires StoreScanner to hang on to all StoreFileScanners and the MemstoreScanner, until checkReseek is called or the scanner is closed. This in turn means that we keep references to the readers open, etc.

        I keep doing that... Filing jiras with patches and then finding that the fix is a bad idea.
        Well, at least I learned about modern CPU and that synchronized in tight loops is very expensive.

        I will keep thinking about this. Unscheduling for now.

        Show
        Lars Hofhansl added a comment - No longer sure that patch is good. The problem was that StoreScanner.peek() is synchronized. With the patch peek() will use the old scanner stack, looking at MemstoreScanner and StoreFileScanner it is correct (both scanners just return the current KV and StoreFileScanner does not touch its reader during peek), but it is fragile, it also requires StoreScanner to hang on to all StoreFileScanners and the MemstoreScanner, until checkReseek is called or the scanner is closed. This in turn means that we keep references to the readers open, etc. I keep doing that... Filing jiras with patches and then finding that the fix is a bad idea. Well, at least I learned about modern CPU and that synchronized in tight loops is very expensive. I will keep thinking about this. Unscheduling for now.
        Hide
        Lars Hofhansl added a comment -

        Actually this is not quite right in 0.94, because it does not have HBASE-6499. (seek does not call checkReseek). I'll make that change as well, also checkUpdatedReaders can be folded into checkReseek for better readability.

        Show
        Lars Hofhansl added a comment - Actually this is not quite right in 0.94, because it does not have HBASE-6499 . (seek does not call checkReseek). I'll make that change as well, also checkUpdatedReaders can be folded into checkReseek for better readability.
        Hide
        Lars Hofhansl added a comment -

        Epoch would be like reference counting per distinct sweet of readers. With just a reference count of scanners I'd be worried that compaction would never make any progress.

        Show
        Lars Hofhansl added a comment - Epoch would be like reference counting per distinct sweet of readers. With just a reference count of scanners I'd be worried that compaction would never make any progress.
        Hide
        stack added a comment -

        +1 on committing what is done already.

        epoch sounds like refcounting? Yeah, no hurry deleting the old stuff as long as it is done eventually. Accounting would be easier if we could move/rename files under the scanner as long is it does not disrupt (maybe I can try this).

        I like your idea of lockless scanning. Would be good to put it up as a goal even if hard to attain, if only to orientate which way progress lies.

        Show
        stack added a comment - +1 on committing what is done already. epoch sounds like refcounting? Yeah, no hurry deleting the old stuff as long as it is done eventually. Accounting would be easier if we could move/rename files under the scanner as long is it does not disrupt (maybe I can try this). I like your idea of lockless scanning. Would be good to put it up as a goal even if hard to attain, if only to orientate which way progress lies.
        Hide
        Andrew Purtell added a comment -

        On order for the compaction to make progress we could assign scanners to epochs, where everytime we change the readers for a store we go to a new epoch. If all scanners for an epoch are either done or have switched to the new epoch, we can retire the readers of that epoch.

        I haven't thought about this nearly as much as you have recently but that sounds promising.

        Show
        Andrew Purtell added a comment - On order for the compaction to make progress we could assign scanners to epochs, where everytime we change the readers for a store we go to a new epoch. If all scanners for an epoch are either done or have switched to the new epoch, we can retire the readers of that epoch. I haven't thought about this nearly as much as you have recently but that sounds promising.
        Hide
        Lars Hofhansl added a comment -

        Something like this.

        On order for the compaction to make progress we could assign scanners to epochs, where everytime we change the readers for a store we go to a new epoch. If all scanners for an epoch are either done or have switched to the new epoch, we can retire the readers of that epoch.

        In any case, just commit this change and keep working on it? As Vladimir points out there are other issues with more serious performance implications.

        Show
        Lars Hofhansl added a comment - Something like this. On order for the compaction to make progress we could assign scanners to epochs, where everytime we change the readers for a store we go to a new epoch. If all scanners for an epoch are either done or have switched to the new epoch, we can retire the readers of that epoch. In any case, just commit this change and keep working on it? As Vladimir points out there are other issues with more serious performance implications.
        Hide
        stack added a comment -

        + Scans run free for N seconds or nanoseconds since checking this should be cheap and then they go to a checkpoint where they look to see if they should reset. ChangedReaders blocks until checkpoint has been cleared.
        + Scans check for closing being set on each op (I suppose this check of a volatile would be just as bad as a synchronization).
        + We refcount outstanding scanners and only delete compacted files when refcount goes to zero. Not sure how we'd switch in flushes unless we prevent the flush happening while ongoing scan (eek).

        Show
        stack added a comment - + Scans run free for N seconds or nanoseconds since checking this should be cheap and then they go to a checkpoint where they look to see if they should reset. ChangedReaders blocks until checkpoint has been cleared. + Scans check for closing being set on each op (I suppose this check of a volatile would be just as bad as a synchronization). + We refcount outstanding scanners and only delete compacted files when refcount goes to zero. Not sure how we'd switch in flushes unless we prevent the flush happening while ongoing scan (eek).
        Hide
        Lars Hofhansl added a comment -

        I can come up with various schemes for that, but nothing that is cheaper than just synchronizing all the methods.

        Show
        Lars Hofhansl added a comment - I can come up with various schemes for that, but nothing that is cheaper than just synchronizing all the methods.
        Hide
        Lars Hofhansl added a comment -

        Yeah, I'm coming around to that.
        We have to figure out how to do that without excessive synchronization. At the minimum we have to guarantee that nobody is currently using the readers in question before we retire them.
        Alternatively we wait for all running scanners to finish. How do we do that?

        Show
        Lars Hofhansl added a comment - Yeah, I'm coming around to that. We have to figure out how to do that without excessive synchronization. At the minimum we have to guarantee that nobody is currently using the readers in question before we retire them. Alternatively we wait for all running scanners to finish. How do we do that?
        Hide
        stack added a comment -

        And it would be silly creating a regionscannerimpl to do a storescan. We need to do update readers differently. Compactions and flushes are rare and background tasks. They should defer to the ongoing scans.

        Show
        stack added a comment - And it would be silly creating a regionscannerimpl to do a storescan. We need to do update readers differently. Compactions and flushes are rare and background tasks. They should defer to the ongoing scans.
        Hide
        Lars Hofhansl added a comment -

        I thought, since we lock all operations at the RegioScannerImpl level anyway, we could exclude a Store's updateReaders at that level too. That works fine if we have a RegionScanner, but in the case of compactions and flushes we don't one, so I have no way to protect the StoreScanner reading on behalf of a flush or a compaction.

        Show
        Lars Hofhansl added a comment - I thought, since we lock all operations at the RegioScannerImpl level anyway, we could exclude a Store's updateReaders at that level too. That works fine if we have a RegionScanner, but in the case of compactions and flushes we don't one, so I have no way to protect the StoreScanner reading on behalf of a flush or a compaction.
        Hide
        stack added a comment -

        but not for compactions/flushes ....

        Tell us more? Am interested.

        Show
        stack added a comment - but not for compactions/flushes .... Tell us more? Am interested.
        Hide
        Lars Hofhansl added a comment -

        pass the RegionScannerImpl instance as a lock object to StoreScanner

        Alas, that works fine for scanning, but not for compactions/flushes (which also use StoreScanner, and can actually override the StoreScanner via a coprocessor hook).

        Show
        Lars Hofhansl added a comment - pass the RegionScannerImpl instance as a lock object to StoreScanner Alas, that works fine for scanning, but not for compactions/flushes (which also use StoreScanner, and can actually override the StoreScanner via a coprocessor hook).
        Hide
        Lars Hofhansl added a comment -

        Vladimir Rodionov, yeah, need to work on that too. Is that with block encoding? When using block encoding we need to copy the underlying byte[]. When no block encoding is used making a new KV is just a few dozen bytes.

        Show
        Lars Hofhansl added a comment - Vladimir Rodionov , yeah, need to work on that too. Is that with block encoding? When using block encoding we need to copy the underlying byte[]. When no block encoding is used making a new KV is just a few dozen bytes.
        Hide
        Lars Hofhansl added a comment -

        Here's another idea:

        • we already lock the RegionScannerImpl.next/seek/reseek, etc.
        • what if we pass the RegionScannerImpl instance as a lock object to StoreScanner
        • in updateReaders() we'd then lock on that RegionScannerImpl instance, or we'd call a special synchronized method in RegionScannerImpl and have it update the readers.

        The lock scope would be broader, but it should be correct, and it would allow us to remove all locking from StoreScanner.
        I'll experiment with that.

        Show
        Lars Hofhansl added a comment - Here's another idea: we already lock the RegionScannerImpl.next/seek/reseek, etc. what if we pass the RegionScannerImpl instance as a lock object to StoreScanner in updateReaders() we'd then lock on that RegionScannerImpl instance, or we'd call a special synchronized method in RegionScannerImpl and have it update the readers. The lock scope would be broader, but it should be correct, and it would allow us to remove all locking from StoreScanner. I'll experiment with that.
        Hide
        Vladimir Rodionov added a comment -

        so there is potential for a lot more improvements

        KV creation (new) is one of the serious bottlenecks, but I have no idea how to not create new instances on next. I have done some HBase internal hacks to get Maximum possible performance from scan. It is the multi-threaded application (in my case - 8HT threads) and scans on StoreFileScanner directly (data is cached 100%). The table was tall and narrow.

        Stock HBase was able to reach 50M KV per sec
        Stock with KV reuse (hack) - 90M KV per sec.

        Show
        Vladimir Rodionov added a comment - so there is potential for a lot more improvements KV creation (new) is one of the serious bottlenecks, but I have no idea how to not create new instances on next . I have done some HBase internal hacks to get Maximum possible performance from scan. It is the multi-threaded application (in my case - 8HT threads) and scans on StoreFileScanner directly (data is cached 100%). The table was tall and narrow. Stock HBase was able to reach 50M KV per sec Stock with KV reuse (hack) - 90M KV per sec.
        Hide
        Lars Hofhansl added a comment -

        The findbugs warning is a dud, it complains about StoreScanner.heap being locked 79% of the cases.

        Show
        Lars Hofhansl added a comment - The findbugs warning is a dud, it complains about StoreScanner.heap being locked 79% of the cases.
        Hide
        stack added a comment -

        It would require some refactoring.

        Lets make a plan.

        Show
        stack added a comment - It would require some refactoring. Lets make a plan.
        Hide
        Lars Hofhansl added a comment -

        I'll look at the findbugs issue, do some more tests, and then commit.

        An interesting metric we have to start to pay more attention is the scan cost per KV.
        For example scanning through 1m rows with one CQ is much slower than scanning through 100k rows with 10 CQs, even though it touches the same number of KVs. This patch helps a bit to even that out.

        As stack and I said in the comments here, it should be possible to remove all synchronization form StoreScanner and RegionScannerImpl. It would require some refactoring.

        Show
        Lars Hofhansl added a comment - I'll look at the findbugs issue, do some more tests, and then commit. An interesting metric we have to start to pay more attention is the scan cost per KV. For example scanning through 1m rows with one CQ is much slower than scanning through 100k rows with 10 CQs, even though it touches the same number of KVs. This patch helps a bit to even that out. As stack and I said in the comments here, it should be possible to remove all synchronization form StoreScanner and RegionScannerImpl. It would require some refactoring.
        Hide
        Lars Hofhansl added a comment -

        There's more stuff to do. I got a crash course on CPU design from Hussam Mousa, one of our performance experts.
        We saw a lot of CPU frontend and backend stalls during scanning, so there is potential for a lot more improvements.

        I'm still looking for ways to remove all synchronization from StoreScanner and RegionScannerImpl.

        Show
        Lars Hofhansl added a comment - There's more stuff to do. I got a crash course on CPU design from Hussam Mousa, one of our performance experts. We saw a lot of CPU frontend and backend stalls during scanning, so there is potential for a lot more improvements. I'm still looking for ways to remove all synchronization from StoreScanner and RegionScannerImpl.
        Hide
        stack added a comment -

        +1 on commit.

        Show
        stack added a comment - +1 on commit.
        Hide
        Lars Hofhansl added a comment -

        Need to look at the find bugs thing

        Show
        Lars Hofhansl added a comment - Need to look at the find bugs thing
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12615371/10015-trunk-v4.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615371/10015-trunk-v4.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7972//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12615356/10015-trunk-v4.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        -1 core tests. The patch failed these unit tests:

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615356/10015-trunk-v4.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7970//console This message is automatically generated.
        Hide
        Lars Hofhansl added a comment -

        Thanks Stack.

        Show
        Lars Hofhansl added a comment - Thanks Stack.
        Hide
        stack added a comment -

        Get another run in.

        Show
        stack added a comment - Get another run in.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12615283/10015-trunk-v4.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615283/10015-trunk-v4.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7968//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12615278/10015-trunk-v3.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.regionserver.wal.TestLogRolling

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615278/10015-trunk-v3.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLogRolling Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7967//console This message is automatically generated.
        Hide
        Lars Hofhansl added a comment -

        Yep. Found issues with the first two versions of this.

        Show
        Lars Hofhansl added a comment - Yep. Found issues with the first two versions of this.
        Hide
        stack added a comment -

        OK. Would suggest running hadoopqa a few times. Tests around changing readers are usually pretty good at finding issues if any.

        Show
        stack added a comment - OK. Would suggest running hadoopqa a few times. Tests around changing readers are usually pretty good at finding issues if any.
        Hide
        Lars Hofhansl added a comment -

        checkUpdateReaders is called from checkReseek, which is only called from synchronized methods.

        You are helping. Looking at this from different angles is good.

        Show
        Lars Hofhansl added a comment - checkUpdateReaders is called from checkReseek, which is only called from synchronized methods. You are helping. Looking at this from different angles is good.
        Hide
        stack added a comment -

        The closing and reopening is what needs the synchronization

        I'm not helping.

        This looks like an issue:

        + if (shouldUpdateReaders) {
        + shouldUpdateReaders = false;

        It is happening outside a sync block and shouldUpdateReaders is not volatile.

        Show
        stack added a comment - The closing and reopening is what needs the synchronization I'm not helping. This looks like an issue: + if (shouldUpdateReaders) { + shouldUpdateReaders = false; It is happening outside a sync block and shouldUpdateReaders is not volatile.
        Hide
        Lars Hofhansl added a comment -

        Same for trunk.
        (This give a few % better performance)

        Show
        Lars Hofhansl added a comment - Same for trunk. (This give a few % better performance)
        Hide
        Lars Hofhansl added a comment -

        Removed AtomicBoolean

        Show
        Lars Hofhansl added a comment - Removed AtomicBoolean
        Hide
        Lars Hofhansl added a comment -

        The closing and reopening is what needs the synchronization
        A thread might still be using the scanner. If we can make it so that we do not have to synchronize all methods in StoreScanner, and still do it safely I'd be happy... But frankly ATM I do not see how without other expensive synchronization.

        Show
        Lars Hofhansl added a comment - The closing and reopening is what needs the synchronization A thread might still be using the scanner. If we can make it so that we do not have to synchronize all methods in StoreScanner, and still do it safely I'd be happy... But frankly ATM I do not see how without other expensive synchronization.
        Hide
        Lars Hofhansl added a comment -

        Actually now it does not even need the AtomicBoolean anymore.

        Show
        Lars Hofhansl added a comment - Actually now it does not even need the AtomicBoolean anymore.
        Hide
        stack added a comment -

        Then we have to wait for the lease to expire before the compaction can finish.

        I was thinking this might not be the end of the world.

        What if we closed and reopened the current scanner when readers are changed. It is a rare event.

        Show
        stack added a comment - Then we have to wait for the lease to expire before the compaction can finish. I was thinking this might not be the end of the world. What if we closed and reopened the current scanner when readers are changed. It is a rare event.
        Hide
        Lars Hofhansl added a comment -

        Phoenix with v3 (different machines than above, so do not compare absolute numbers).
        Without patch:

        0: jdbc:phoenix:localhost> select count(*) from "myNew1";
        +----------+
        | COUNT(1) |
        +----------+
        | 30000000 |
        +----------+
        1 row selected (18.417 seconds)
        

        With patch:

        0: jdbc:phoenix:localhost> select count(*) from "myNew1";
        +----------+
        | COUNT(1) |
        +----------+
        | 30000000 |
        +----------+
        1 row selected (13.319 seconds)
        
        Show
        Lars Hofhansl added a comment - Phoenix with v3 (different machines than above, so do not compare absolute numbers). Without patch: 0: jdbc:phoenix:localhost> select count(*) from "myNew1" ; +----------+ | COUNT(1) | +----------+ | 30000000 | +----------+ 1 row selected (18.417 seconds) With patch: 0: jdbc:phoenix:localhost> select count(*) from "myNew1" ; +----------+ | COUNT(1) | +----------+ | 30000000 | +----------+ 1 row selected (13.319 seconds)
        Hide
        Lars Hofhansl added a comment -

        I think it would be hard to

        1. push reader switching to the scanning thread
        2. guarantee that the readers will eventually be switched to the compaction can proceed and finished

        What if a scanner isn't closed? Or nobody ever calls next() on it? Then we have to wait for the lease to expire before the compaction can finish.

        next()/seek()/reseek() are not as critical as they are not called remotely as often as peek. peek() is also a very small method and (according to our perf expert here) small synchronized methods have the greatest potential to throw the branch prediction off.

        Show
        Lars Hofhansl added a comment - I think it would be hard to push reader switching to the scanning thread guarantee that the readers will eventually be switched to the compaction can proceed and finished What if a scanner isn't closed? Or nobody ever calls next() on it? Then we have to wait for the lease to expire before the compaction can finish. next()/seek()/reseek() are not as critical as they are not called remotely as often as peek. peek() is also a very small method and (according to our perf expert here) small synchronized methods have the greatest potential to throw the branch prediction off.
        Hide
        Lars Hofhansl added a comment -

        Oops. Attached wrong trunk patch. This is right one. All good

        Show
        Lars Hofhansl added a comment - Oops. Attached wrong trunk patch. This is right one. All good
        Hide
        stack added a comment -

        Can we have it so no changing of readers during a scan?

        Show
        stack added a comment - Can we have it so no changing of readers during a scan?
        Hide
        Lars Hofhansl added a comment -

        And trunk.
        This patch should be correct for all scenarios.

        Show
        Lars Hofhansl added a comment - And trunk. This patch should be correct for all scenarios.
        Hide
        Lars Hofhansl added a comment -

        New v3 patch for 0.94

        Show
        Lars Hofhansl added a comment - New v3 patch for 0.94
        Hide
        Lars Hofhansl added a comment -

        That seems to be correct. updateReaders now waits until all operations that are effected by the reader changes to finish. peek() does not need to be synchronized as long as we do not change scanner stack from under its feet, which we avoid by deferring to the scanner thread to do that.

        The unittest now yields this:
        10 runs mean:4449.2 sigma:24.395901295094635, without patch
        10 runs mean:2609.2 sigma:67.94085663280968, with patch

        So, still a significant win, albeit to quite what it was before.

        Show
        Lars Hofhansl added a comment - That seems to be correct. updateReaders now waits until all operations that are effected by the reader changes to finish. peek() does not need to be synchronized as long as we do not change scanner stack from under its feet, which we avoid by deferring to the scanner thread to do that. The unittest now yields this: 10 runs mean:4449.2 sigma:24.395901295094635, without patch 10 runs mean:2609.2 sigma:67.94085663280968, with patch So, still a significant win, albeit to quite what it was before.
        Hide
        Lars Hofhansl added a comment -

        A possible solution is keeping updateReaders() all methods that actually call checkReseek synchronized.
        So peek() could still go without synchronization. Will test the performance with that.

        Show
        Lars Hofhansl added a comment - A possible solution is keeping updateReaders() all methods that actually call checkReseek synchronized. So peek() could still go without synchronization. Will test the performance with that.
        Hide
        Lars Hofhansl added a comment - - edited

        Unfortunately I am no longer sure it actually works. I identified the problem:
        In HStore.completeCompaction we call notifyChangedReadersObservers, which calls updateReaders on all StoreScanners. Before this patch, this method would block if there was a StoreScanner in the middle of a next/seek/reseek/etc. So before that patch one would guarantee that after notifyChangedReadersObservers returns it is safe to remove the compacted files. That is no longer true.

        I'll see if I can come up with something. It is a shame that we have to synchronize and get 10's of millions of branch misses per second, just so we can compact a few times a day.

        Show
        Lars Hofhansl added a comment - - edited Unfortunately I am no longer sure it actually works. I identified the problem: In HStore.completeCompaction we call notifyChangedReadersObservers, which calls updateReaders on all StoreScanners. Before this patch, this method would block if there was a StoreScanner in the middle of a next/seek/reseek/etc. So before that patch one would guarantee that after notifyChangedReadersObservers returns it is safe to remove the compacted files. That is no longer true. I'll see if I can come up with something. It is a shame that we have to synchronize and get 10's of millions of branch misses per second, just so we can compact a few times a day.
        Hide
        stack added a comment -

        The numbers are great. Patch looks good given the premise. Was going to suggest volatile instead of AtomicBoolean but looks like the getAndSet is fundamental so withdraw this nit. Is test failure because of this patch?

        Show
        stack added a comment - The numbers are great. Patch looks good given the premise. Was going to suggest volatile instead of AtomicBoolean but looks like the getAndSet is fundamental so withdraw this nit. Is test failure because of this patch?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12615249/10015-trunk-v2.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615249/10015-trunk-v2.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7966//console This message is automatically generated.
        Hide
        Lars Hofhansl added a comment -

        Still getting the NPE in trunk in TestHRegion with v2, though (0.94 is fine with v2)... Looking.

        Show
        Lars Hofhansl added a comment - Still getting the NPE in trunk in TestHRegion with v2, though (0.94 is fine with v2)... Looking.
        Hide
        Lars Hofhansl added a comment -

        10 runs mean:2770.5 sigma:85.37827592543668, without
        10 runs mean:1791.2 sigma:50.81495842761264, with

        Show
        Lars Hofhansl added a comment - 10 runs mean:2770.5 sigma:85.37827592543668, without 10 runs mean:1791.2 sigma:50.81495842761264, with
        Hide
        Lars Hofhansl added a comment -

        And a trunk version. Note with v2 I see the same improvement.

        Show
        Lars Hofhansl added a comment - And a trunk version. Note with v2 I see the same improvement.
        Hide
        Lars Hofhansl added a comment -

        Arrghh... No idea how the first patch could actually work at all. If you look closely you'll see that the updateReader condition was never unset. So on each an every call to next() it would force a reset of the scanner stack.

        Here's an update, which resets the condition atomically.

        Show
        Lars Hofhansl added a comment - Arrghh... No idea how the first patch could actually work at all . If you look closely you'll see that the updateReader condition was never unset. So on each an every call to next() it would force a reset of the scanner stack. Here's an update, which resets the condition atomically.
        Hide
        Lars Hofhansl added a comment - - edited

        Got together with some of our hardware exports. We measured some of a perf counters and found that with the patch we see 50% less branch-misses, which is significant.

        Show
        Lars Hofhansl added a comment - - edited Got together with some of our hardware exports. We measured some of a perf counters and found that with the patch we see 50% less branch-misses, which is significant.
        Hide
        Lars Hofhansl added a comment -

        The failures are all NPEs. Two of them could be related, the last is during DFS cluster shutdown.

        Show
        Lars Hofhansl added a comment - The failures are all NPEs. Two of them could be related, the last is during DFS cluster shutdown.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12615205/10015-trunk.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.regionserver.TestHRegion
        org.apache.hadoop.hbase.util.TestMergeTable
        org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615205/10015-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegion org.apache.hadoop.hbase.util.TestMergeTable org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7965//console This message is automatically generated.
        Hide
        Lars Hofhansl added a comment -
        0: jdbc:phoenix:localhost> select count(*) from "tableTall";
        +----------+
        | COUNT(1) |
        +----------+
        | 20000000 |
        +----------+
        1 row selected (3.946 seconds)
        

        With patch:

        0: jdbc:phoenix:localhost> select count(*) from "tableTall";
        +----------+
        | COUNT(1) |
        +----------+
        | 20000000 |
        +----------+
        1 row selected (2.915 seconds)
        

        Phoenix is using FAST_DIFF encoding and the ExplicitColumnTracker, so the relative gain is only 25% there (one CQ per row)

        Show
        Lars Hofhansl added a comment - 0: jdbc:phoenix:localhost> select count(*) from "tableTall" ; +----------+ | COUNT(1) | +----------+ | 20000000 | +----------+ 1 row selected (3.946 seconds) With patch: 0: jdbc:phoenix:localhost> select count(*) from "tableTall" ; +----------+ | COUNT(1) | +----------+ | 20000000 | +----------+ 1 row selected (2.915 seconds) Phoenix is using FAST_DIFF encoding and the ExplicitColumnTracker, so the relative gain is only 25% there (one CQ per row)
        Hide
        Lars Hofhansl added a comment -

        Some more results:
        10 runs mean:1851.0 sigma:127.71374240855992
        vs (without StoreScanner patch)
        10 runs mean:2795.8 sigma:57.52182194611015

        Show
        Lars Hofhansl added a comment - Some more results: 10 runs mean:1851.0 sigma:127.71374240855992 vs (without StoreScanner patch) 10 runs mean:2795.8 sigma:57.52182194611015
        Hide
        Lars Hofhansl added a comment -

        Trunk patch.

        Show
        Lars Hofhansl added a comment - Trunk patch.
        Hide
        Lars Hofhansl added a comment - - edited

        Thanks Vladimir Rodionov, we'll look at RegionScanner next. My evil plans is to eventually get rid of all synchronization during scanning and provide exclusion by containment instead.

        The overall improvement is real. Validated on various different machines. The effect of memory stalls is probably more pronounced in the running server.

        In any case, I think we agree that this patch can't make things worse (provide it is correct, of course).
        I also tried with count(*) queries in Phoenix on tall tables (to rule out some anomalies with Filters). I see a 90% improvement there - again only on tall tables.

        Show
        Lars Hofhansl added a comment - - edited Thanks Vladimir Rodionov , we'll look at RegionScanner next. My evil plans is to eventually get rid of all synchronization during scanning and provide exclusion by containment instead. The overall improvement is real. Validated on various different machines. The effect of memory stalls is probably more pronounced in the running server. In any case, I think we agree that this patch can't make things worse (provide it is correct, of course). I also tried with count(*) queries in Phoenix on tall tables (to rule out some anomalies with Filters). I see a 90% improvement there - again only on tall tables.
        Hide
        Vladimir Rodionov added a comment -

        I ran RegionScanner and see ~10% improvement on on a very narrow rows only (8 store files in a region). This is 0.94.6. By the way the performance difference between StoreScanner and RegionScanner is huge (almost 3x times).

        Show
        Vladimir Rodionov added a comment - I ran RegionScanner and see ~10% improvement on on a very narrow rows only (8 store files in a region). This is 0.94.6. By the way the performance difference between StoreScanner and RegionScanner is huge (almost 3x times).
        Hide
        Lars Hofhansl added a comment -

        I'll make a trunk patch later today.

        Show
        Lars Hofhansl added a comment - I'll make a trunk patch later today.
        Hide
        Ted Yu added a comment -

        200 iterations of TestStoreScanner, TestAtomicOperation and TestAcidGuarantees passed based on 0.94 patch.

        Show
        Ted Yu added a comment - 200 iterations of TestStoreScanner, TestAtomicOperation and TestAcidGuarantees passed based on 0.94 patch.
        Hide
        Ted Yu added a comment -

        @Andy:
        Thanks for the hint.
        I ran 'umask 022' first but got the same result on Mac.

        Show
        Ted Yu added a comment - @Andy: Thanks for the hint. I ran 'umask 022' first but got the same result on Mac.
        Hide
        Andrew Purtell added a comment -

        Ted Yu that's probably HBASE-5711, do 'umask 022' first.

        Show
        Andrew Purtell added a comment - Ted Yu that's probably HBASE-5711 , do 'umask 022' first.
        Hide
        Andrew Purtell added a comment -

        Without: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:2224.3 sigma:24.178709642989634

        With: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:1373.3 sigma:14.920120642943875

        This is on an EC2 c3.2xlarge, which uses 16 of 20 available hardware threads of a two socket board with Xeon E5-2680s, backed by SSDs.

        Show
        Andrew Purtell added a comment - Without: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:2224.3 sigma:24.178709642989634 With: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance): 10 runs mean:1373.3 sigma:14.920120642943875 This is on an EC2 c3.2xlarge, which uses 16 of 20 available hardware threads of a two socket board with Xeon E5-2680s, backed by SSDs.
        Hide
        Lars Hofhansl added a comment -

        That would be related to your setup I think. Seems like it can't start the mini cluster.

        Show
        Lars Hofhansl added a comment - That would be related to your setup I think. Seems like it can't start the mini cluster.
        Hide
        Ted Yu added a comment -

        I tried to run the unit test but got:

        testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance)  Time elapsed: 0.007 sec  <<< ERROR!
        org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/tyu/hbase/hbase.version could only be replicated to 0 nodes, instead of 1
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
          at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
        
        Show
        Ted Yu added a comment - I tried to run the unit test but got: testScanFilterPerformance(org.apache.hadoop.hbase.regionserver.TestScanFilterPerformance) Time elapsed: 0.007 sec <<< ERROR! org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/tyu/hbase/hbase.version could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597)
        Hide
        Lars Hofhansl added a comment -

        If somebody could run the attached unit test that would be greatly appreciated.

        Show
        Lars Hofhansl added a comment - If somebody could run the attached unit test that would be greatly appreciated.
        Hide
        Lars Hofhansl added a comment -

        One last note before I call it quits today.
        The win appears to reduce the per row overhead by 50-60%. With 5 CQs this shrinks proportionally to 10-15% or so.

        Show
        Lars Hofhansl added a comment - One last note before I call it quits today. The win appears to reduce the per row overhead by 50-60%. With 5 CQs this shrinks proportionally to 10-15% or so.
        Hide
        Lars Hofhansl added a comment -

        Added a quick and dirty perf test. The test will fail at the end and failure message is the runtime (I find this the most convenient).

        I get:
        10 runs mean:2315.9 sigma:121.37417352962696
        and
        10 runs mean:4721.9 sigma:153.24911092727422
        with the changes to StoreScanner reverted.

        Show
        Lars Hofhansl added a comment - Added a quick and dirty perf test. The test will fail at the end and failure message is the runtime (I find this the most convenient). I get: 10 runs mean:2315.9 sigma:121.37417352962696 and 10 runs mean:4721.9 sigma:153.24911092727422 with the changes to StoreScanner reverted.
        Hide
        Vladimir Rodionov added a comment -

        Also 25ns are more than 100 cycles on modern HW, in line what I would expect from memory stalls caused by the fences.

        With biased locking on, its ~ 2ns. I will try RegionScanner.

        Show
        Vladimir Rodionov added a comment - Also 25ns are more than 100 cycles on modern HW, in line what I would expect from memory stalls caused by the fences. With biased locking on, its ~ 2ns. I will try RegionScanner.
        Hide
        Lars Hofhansl added a comment -

        Will add those options. All the tests run at least 15s, though.

        Oh, also rereading your comment above... I saw the most significant improvement per row (i.e. with 5 CQ you'd see less of an improvement). This very much looks like it is an interaction between KeyValueHeap and StoreScanner, where the cost seems to be mostly per row, rather than KeyValue.

        Show
        Lars Hofhansl added a comment - Will add those options. All the tests run at least 15s, though. Oh, also rereading your comment above... I saw the most significant improvement per row (i.e. with 5 CQ you'd see less of an improvement). This very much looks like it is an interaction between KeyValueHeap and StoreScanner, where the cost seems to be mostly per row, rather than KeyValue.
        Hide
        Lars Hofhansl added a comment -

        Can you try at the RegionScanner level, which maintains a heap of StoreScanners and calls peek() quite frequently. In my sampling sampling profiler I saw StoreScanner.peek() come us as some of the top method where the time is spent (and does nothing but doing a compare on a local member then calls peek on its heap).

        Also 25ns are more than 100 cycles on modern HW, in line what I would expect from memory stalls caused by the fences.

        Show
        Lars Hofhansl added a comment - Can you try at the RegionScanner level, which maintains a heap of StoreScanners and calls peek() quite frequently. In my sampling sampling profiler I saw StoreScanner.peek() come us as some of the top method where the time is spent (and does nothing but doing a compare on a local member then calls peek on its heap). Also 25ns are more than 100 cycles on modern HW, in line what I would expect from memory stalls caused by the fences.
        Hide
        Vladimir Rodionov added a comment -

        For all microbenchmarks, please add the following command -line args:

        -XX:+UseBiasedLocking -XX:BiasedLockingStartupDelay=0
        

        Oracle JVM does not enable biased locking (single thread lock optimization) first 4s of program execution.

        Show
        Vladimir Rodionov added a comment - For all microbenchmarks, please add the following command -line args: -XX:+UseBiasedLocking -XX:BiasedLockingStartupDelay=0 Oracle JVM does not enable biased locking (single thread lock optimization) first 4s of program execution.
        Hide
        Vladimir Rodionov added a comment -

        May be I am wrong (empty synchronized method call cost on my laptop is 25 ns) but my own tests on StoreScanner show 0 improvement.

        Code is simple:

        create region, populate with data (make sure data is in a cache) , then

             LOG.info("Test store scanner");
              Scan scan = new Scan();
              scan.setStartRow(region.getStartKey());
              scan.setStopRow(region.getEndKey());
              Store store = region.getStore(CF);
              StoreScanner scanner = new StoreScanner(store,  store.getScanInfo(), scan,  null);
              long start = System.currentTimeMillis();
              int total = 0;
              List<KeyValue> result = new ArrayList<KeyValue>();
              while(scanner.next(result)){
                total++; result.clear();
              }
              
              LOG.info("Test store scanner finished. Found "+total +" in "+(System.currentTimeMillis() - start)+"ms");
        

        This test shows exact the same time for both: default StoreScanner and unsynchronized StoreScanner. The scan is not very fast: 1-1.5M rows per sec (rows are relatively small: 1 CF + 5 CQ, ~ 120 bytes )

        Show
        Vladimir Rodionov added a comment - May be I am wrong (empty synchronized method call cost on my laptop is 25 ns) but my own tests on StoreScanner show 0 improvement. Code is simple: create region, populate with data (make sure data is in a cache) , then LOG.info( "Test store scanner" ); Scan scan = new Scan(); scan.setStartRow(region.getStartKey()); scan.setStopRow(region.getEndKey()); Store store = region.getStore(CF); StoreScanner scanner = new StoreScanner(store, store.getScanInfo(), scan, null ); long start = System .currentTimeMillis(); int total = 0; List<KeyValue> result = new ArrayList<KeyValue>(); while (scanner.next(result)){ total++; result.clear(); } LOG.info( "Test store scanner finished. Found " +total + " in " +( System .currentTimeMillis() - start)+ "ms" ); This test shows exact the same time for both: default StoreScanner and unsynchronized StoreScanner. The scan is not very fast: 1-1.5M rows per sec (rows are relatively small: 1 CF + 5 CQ, ~ 120 bytes )
        Hide
        Lars Hofhansl added a comment -

        If somebody else could run the test I attached that'd be great. I ran against a local single node HBase on top of a single node HDFS cluster, with all data in the blockcache. Maybe a unittest against a mini cluster would be better.

        Show
        Lars Hofhansl added a comment - If somebody else could run the test I attached that'd be great. I ran against a local single node HBase on top of a single node HDFS cluster, with all data in the blockcache. Maybe a unittest against a mini cluster would be better.
        Hide
        Lars Hofhansl added a comment - - edited

        I concur with myself one more time: the cost of synchronized is very low when there is no thread contention.

        Well, you're wrong twice then

        Just try it... Call a synchronized method in a loop a few 100 million times. Then remove the synchronized. Make sure the method returns something, such as a reference to a member, so it is not optimized immediately.

        On my test machines (JDK6 and JDK7) the latter is at least 40x faster on some machines it's 63x faster. All just a single thread.

        As I said before, synchronized does more than exclusion.

        1. it barres JVM from reordering instructions
        2. it places memory fences (both read and write), which -depending on exact hw- disallows instruction reordering of the CPU
        3. it may flush cache lines
        Show
        Lars Hofhansl added a comment - - edited I concur with myself one more time: the cost of synchronized is very low when there is no thread contention. Well, you're wrong twice then Just try it... Call a synchronized method in a loop a few 100 million times. Then remove the synchronized. Make sure the method returns something, such as a reference to a member, so it is not optimized immediately. On my test machines (JDK6 and JDK7) the latter is at least 40x faster on some machines it's 63x faster. All just a single thread. As I said before, synchronized does more than exclusion. it barres JVM from reordering instructions it places memory fences (both read and write), which -depending on exact hw- disallows instruction reordering of the CPU it may flush cache lines
        Hide
        Lars Hofhansl added a comment -

        Test code. This is some crap extracted from various tests, don't judge me on that code.. I will deny that I've ever written that.
        Use it with TestLoad <tableName>. It will create and seed the table if needed.
        Then it runs the scan tests 5 times (after a prime round) and calculates mean and standard deviation.

        Did I mention that this is hack?

        For the record, I see a 2-3x scan speed improvement on both OpenJDK7 on a old'ish 2 core machine as well as with Oracle JDK6 on a 12 core machine. In both cases the scan is from a single client only.

        Show
        Lars Hofhansl added a comment - Test code. This is some crap extracted from various tests, don't judge me on that code.. I will deny that I've ever written that. Use it with TestLoad <tableName>. It will create and seed the table if needed. Then it runs the scan tests 5 times (after a prime round) and calculates mean and standard deviation. Did I mention that this is hack? For the record, I see a 2-3x scan speed improvement on both OpenJDK7 on a old'ish 2 core machine as well as with Oracle JDK6 on a 12 core machine. In both cases the scan is from a single client only.
        Hide
        Lars Hofhansl added a comment -

        Wait... No, I do see the same improvement in the JDK6, 12 core box (I forgot to switch the server over).
        I'll attach a quick and dirty benchmark.

        Show
        Lars Hofhansl added a comment - Wait... No, I do see the same improvement in the JDK6, 12 core box (I forgot to switch the server over). I'll attach a quick and dirty benchmark.
        Hide
        Lars Hofhansl added a comment -

        Confirmed on my laptop with OpenJDK, 2.5x improvement over 10 runs very low standard deviation.
        Going to do some microbenchmarks.

        Show
        Lars Hofhansl added a comment - Confirmed on my laptop with OpenJDK, 2.5x improvement over 10 runs very low standard deviation. Going to do some microbenchmarks.
        Hide
        Lars Hofhansl added a comment -

        Strange... On my desktop machine (12 cores, JDK6) I also measure no difference.
        My Laptop has 2 cores and OpenJDK7. I wonder whether I am seeing an OpenJDK bug.

        Show
        Lars Hofhansl added a comment - Strange... On my desktop machine (12 cores, JDK6) I also measure no difference. My Laptop has 2 cores and OpenJDK7. I wonder whether I am seeing an OpenJDK bug.
        Hide
        Vladimir Rodionov added a comment -

        Different JVM versions?

        1.6_56 Mac OSX 10.7.5. I concur with myself one more time: the cost of synchronized is very low when there is no thread contention.

        Show
        Vladimir Rodionov added a comment - Different JVM versions? 1.6_56 Mac OSX 10.7.5. I concur with myself one more time: the cost of synchronized is very low when there is no thread contention.
        Hide
        Nick Dimiduk added a comment -

        Could the test be isolated in a variant of HFilePerfEval? I've had luck isolating the BlockCache in a similar variant (HBASE-9806).

        Show
        Nick Dimiduk added a comment - Could the test be isolated in a variant of HFilePerfEval? I've had luck isolating the BlockCache in a similar variant ( HBASE-9806 ).
        Hide
        Lars Hofhansl added a comment -

        It'll be hard to disentangle the test. Basically I am testing with a table with a single column family, and filtering all data at the server with a ValueFilter. Lemme have an extra look at the code.

        Show
        Lars Hofhansl added a comment - It'll be hard to disentangle the test. Basically I am testing with a table with a single column family, and filtering all data at the server with a ValueFilter. Lemme have an extra look at the code.
        Hide
        Andrew Purtell added a comment -

        The patch doesn't look wrong, but I'm surprised at the improvement. If not too onerous, perhaps you could attach a test case that reproduces it?

        Show
        Andrew Purtell added a comment - The patch doesn't look wrong, but I'm surprised at the improvement. If not too onerous, perhaps you could attach a test case that reproduces it?
        Hide
        Lars Hofhansl added a comment -

        I just ran mys tests and found no difference at all, but they were single-thread StoreScanner.

        Hmm... All my tests are single threaded. Different JVM versions?
        Are you testing with tall tables?

        Show
        Lars Hofhansl added a comment - I just ran mys tests and found no difference at all, but they were single-thread StoreScanner. Hmm... All my tests are single threaded. Different JVM versions? Are you testing with tall tables?
        Hide
        Lars Hofhansl added a comment -

        Close is fine, since it is only triggered via RegionScannerImpl, which is synchronized.
        The overhead I measured could explain the numbers I've seen here:
        https://issues.apache.org/jira/browse/HBASE-9440?focusedCommentId=13767047&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13767047

        (Compared to the raw scan speed in an HFile I found medium overhead per column and a lot of overhead per rows, and StoreScanner.peek()/next(), etc, are mostly called per row).

        Show
        Lars Hofhansl added a comment - Close is fine, since it is only triggered via RegionScannerImpl, which is synchronized. The overhead I measured could explain the numbers I've seen here: https://issues.apache.org/jira/browse/HBASE-9440?focusedCommentId=13767047&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13767047 (Compared to the raw scan speed in an HFile I found medium overhead per column and a lot of overhead per rows, and StoreScanner.peek()/next(), etc, are mostly called per row).
        Hide
        Lars Hofhansl added a comment -

        Nope. I see the same effect with just one core.
        Will on some other machines and different versions of the JDK.

        Show
        Lars Hofhansl added a comment - Nope. I see the same effect with just one core. Will on some other machines and different versions of the JDK.
        Hide
        Vladimir Rodionov added a comment -

        I just ran mys tests and found no difference at all, but they were single-thread StoreScanner.

        Show
        Vladimir Rodionov added a comment - I just ran mys tests and found no difference at all, but they were single-thread StoreScanner.
        Hide
        Lars Hofhansl added a comment -

        That is true as far as actual thread synchronization goes.

        Every synchronize still places a read and write memory fence, though, and I think that is the effect we're seeing. I was a bit surprised about the magnitude of this myself.

        I can try switching one of my cores off and measure this again, if the issues is memory fencing we should not see any improvement with one core only.

        Show
        Lars Hofhansl added a comment - That is true as far as actual thread synchronization goes. Every synchronize still places a read and write memory fence, though, and I think that is the effect we're seeing. I was a bit surprised about the magnitude of this myself. I can try switching one of my cores off and measure this again, if the issues is memory fencing we should not see any improvement with one core only.
        Hide
        Vladimir Rodionov added a comment -

        It is interesting. Java synchronization w/o thread contention cost is close to zero. You would see the difference only when you run multiple threads accessing the same StoreScanner.

        Show
        Vladimir Rodionov added a comment - It is interesting. Java synchronization w/o thread contention cost is close to zero. You would see the difference only when you run multiple threads accessing the same StoreScanner.
        Hide
        Lars Hofhansl added a comment -

        Actually the main difference is between ScanWildcardColumnTracker and ExplicitColumnTracker. With ExplicitColumnTracker there are still many reseeks that outweigh the performance improvement seen here.

        Show
        Lars Hofhansl added a comment - Actually the main difference is between ScanWildcardColumnTracker and ExplicitColumnTracker. With ExplicitColumnTracker there are still many reseeks that outweigh the performance improvement seen here.
        Hide
        Lars Hofhansl added a comment -

        The 3x improvement I see for tall tables. For wider tables the improvement is less pronounced (for 5 columns I see a 20% or so improvement).

        Also a note as to why I think this should be correct: Even with the current synchronized, a next/peek/reseek that has already started, will finish with the old reader. That has not changed.

        Need to look closer at races between close() and next/peek/reseek, as close could be called from another thread (I think), when the lease expired.

        Show
        Lars Hofhansl added a comment - The 3x improvement I see for tall tables. For wider tables the improvement is less pronounced (for 5 columns I see a 20% or so improvement). Also a note as to why I think this should be correct: Even with the current synchronized, a next/peek/reseek that has already started, will finish with the old reader. That has not changed. Need to look closer at races between close() and next/peek/reseek, as close could be called from another thread (I think), when the lease expired.
        Hide
        Lars Hofhansl added a comment -

        Sample 0.94 patch.

        Show
        Lars Hofhansl added a comment - Sample 0.94 patch.

          People

          • Assignee:
            Lars Hofhansl
            Reporter:
            Lars Hofhansl
          • Votes:
            1 Vote for this issue
            Watchers:
            18 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development