HBase
  1. HBase
  2. HBASE-9087

Handlers being blocked during reads

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.94.7, 0.95.1
    • Fix Version/s: 0.98.0, 0.95.2, 0.94.11
    • Component/s: Performance
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I'm having a lot of handlers (90 - 300 aprox) being blocked when reading rows. They are blocked during changedReaderObserver registration.

      Lars Hofhansl suggests to change the implementation of changedReaderObserver from CopyOnWriteList to ConcurrentHashMap.

      Here is a stack trace:

      "IPC Server handler 99 on 60020" daemon prio=10 tid=0x0000000041c84000 nid=0x2244 waiting on condition [0x00007ff51fefd000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x00000000c5c13ae8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
        at java.util.concurrent.CopyOnWriteArrayList.addIfAbsent(CopyOnWriteArrayList.java:553)
        at java.util.concurrent.CopyOnWriteArraySet.add(CopyOnWriteArraySet.java:221)
        at org.apache.hadoop.hbase.regionserver.Store.addChangedReaderObserver(Store.java:1085)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:138)
        at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2077)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3755)
        at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1804)
        at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1796)
        at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1771)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4776)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4750)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2152)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3700)
        at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
      1. HBASE-9087-1.patch
        2 kB
        Elliott Clark
      2. HBASE-9087-0.patch
        2 kB
        Elliott Clark

        Activity

        Pablo Medina created issue -
        Hide
        demian berjman added a comment -

        From CopyOnWriteArraySet javadoc: "Mutative operations (add, set, remove, etc.) are expensive since they usually entail copying the entire underlying array."

        Show
        demian berjman added a comment - From CopyOnWriteArraySet javadoc: "Mutative operations (add, set, remove, etc.) are expensive since they usually entail copying the entire underlying array."
        Hide
        Elliott Clark added a comment -

        Pretty easy trunk patch. Lets run it by jenkins to see how it does.

        Show
        Elliott Clark added a comment - Pretty easy trunk patch. Lets run it by jenkins to see how it does.
        Elliott Clark made changes -
        Field Original Value New Value
        Attachment HBASE-9087-0.patch [ 12595006 ]
        Elliott Clark made changes -
        Assignee Elliott Clark [ eclark ]
        Elliott Clark made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.95.1 [ 12324288 ]
        Hide
        Lars Hofhansl added a comment -

        Why ConcurrentSkipListMap vs ConcurrentHashMap?

        Show
        Lars Hofhansl added a comment - Why ConcurrentSkipListMap vs ConcurrentHashMap?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12595006/HBASE-9087-0.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingKeyRange
        org.apache.hadoop.hbase.regionserver.TestBlocksScanned
        org.apache.hadoop.hbase.regionserver.TestResettingCounters
        org.apache.hadoop.hbase.regionserver.TestScanWithBloomError
        org.apache.hadoop.hbase.regionserver.TestColumnSeeking
        org.apache.hadoop.hbase.regionserver.TestSplitTransaction
        org.apache.hadoop.hbase.filter.TestColumnPrefixFilter
        org.apache.hadoop.hbase.client.TestIntraRowPagination
        org.apache.hadoop.hbase.filter.TestDependentColumnFilter
        org.apache.hadoop.hbase.filter.TestMultipleColumnPrefixFilter
        org.apache.hadoop.hbase.regionserver.TestKeepDeletes
        org.apache.hadoop.hbase.regionserver.TestMinVersions
        org.apache.hadoop.hbase.filter.TestFilter
        org.apache.hadoop.hbase.regionserver.TestScanner
        org.apache.hadoop.hbase.regionserver.TestWideScanner
        org.apache.hadoop.hbase.coprocessor.TestRegionObserverScannerOpenHook
        org.apache.hadoop.hbase.regionserver.TestRegionMergeTransaction
        org.apache.hadoop.hbase.coprocessor.TestCoprocessorInterface

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595006/HBASE-9087-0.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingKeyRange org.apache.hadoop.hbase.regionserver.TestBlocksScanned org.apache.hadoop.hbase.regionserver.TestResettingCounters org.apache.hadoop.hbase.regionserver.TestScanWithBloomError org.apache.hadoop.hbase.regionserver.TestColumnSeeking org.apache.hadoop.hbase.regionserver.TestSplitTransaction org.apache.hadoop.hbase.filter.TestColumnPrefixFilter org.apache.hadoop.hbase.client.TestIntraRowPagination org.apache.hadoop.hbase.filter.TestDependentColumnFilter org.apache.hadoop.hbase.filter.TestMultipleColumnPrefixFilter org.apache.hadoop.hbase.regionserver.TestKeepDeletes org.apache.hadoop.hbase.regionserver.TestMinVersions org.apache.hadoop.hbase.filter.TestFilter org.apache.hadoop.hbase.regionserver.TestScanner org.apache.hadoop.hbase.regionserver.TestWideScanner org.apache.hadoop.hbase.coprocessor.TestRegionObserverScannerOpenHook org.apache.hadoop.hbase.regionserver.TestRegionMergeTransaction org.apache.hadoop.hbase.coprocessor.TestCoprocessorInterface Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6523//console This message is automatically generated.
        Hide
        Elliott Clark added a comment -

        Well my initial thought was to keep from having a hashtable since that would be resized under a lock, causing latency spikes. But since ConcurrentSkipList requires that everything is a Comparable (and the interface isn't). Looks like Hashtable it is.

        Show
        Elliott Clark added a comment - Well my initial thought was to keep from having a hashtable since that would be resized under a lock, causing latency spikes. But since ConcurrentSkipList requires that everything is a Comparable (and the interface isn't). Looks like Hashtable it is.
        Elliott Clark made changes -
        Attachment HBASE-9087-1.patch [ 12595049 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12595049/HBASE-9087-1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595049/HBASE-9087-1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6527//console This message is automatically generated.
        Hide
        stack added a comment -

        I wonder if this fixes the performance probs the lads saw?

        Show
        stack added a comment - I wonder if this fixes the performance probs the lads saw?
        Hide
        Elliott Clark added a comment -

        I would imagine so. CopyOnWrite has an open java bug stating that it scales non-linearly, because the copy uses insertIfAbsent (or something similar). But if the Integration cluster has some free time I can try a ycsb run.

        Show
        Elliott Clark added a comment - I would imagine so. CopyOnWrite has an open java bug stating that it scales non-linearly, because the copy uses insertIfAbsent (or something similar). But if the Integration cluster has some free time I can try a ycsb run.
        Hide
        Elliott Clark added a comment -

        Running a ycsb workload e benchmark on this right now.

        Show
        Elliott Clark added a comment - Running a ycsb workload e benchmark on this right now.
        Lars Hofhansl made changes -
        Fix Version/s 0.98.0 [ 12323143 ]
        Fix Version/s 0.95.2 [ 12320040 ]
        Fix Version/s 0.94.11 [ 12324741 ]
        Hide
        Pablo Medina added a comment -

        Elliot, did you run that benchmark ? did it improve the performance under concurrency?

        Show
        Pablo Medina added a comment - Elliot, did you run that benchmark ? did it improve the performance under concurrency?
        Hide
        Lars Hofhansl added a comment -

        I'm very curious as well

        Show
        Lars Hofhansl added a comment - I'm very curious as well
        Hide
        Pablo Medina added a comment -

        btw. In the meanwhile... do you guys know what is a 'proper' the number of handlers?. I know that 'proper' means different things in different use cases but have you seen any region server serving requests using 1k handlers or more? It is that common scenario?

        Show
        Pablo Medina added a comment - btw. In the meanwhile... do you guys know what is a 'proper' the number of handlers?. I know that 'proper' means different things in different use cases but have you seen any region server serving requests using 1k handlers or more? It is that common scenario?
        Hide
        Elliott Clark added a comment -

        Running the benchmarks but they are tied in with integration tests so they will take 5 hours or so. I hope to have results by the end of the day.

        Show
        Elliott Clark added a comment - Running the benchmarks but they are tied in with integration tests so they will take 5 hours or so. I hope to have results by the end of the day.
        Hide
        Lars Hofhansl added a comment -

        You want to be able to keep both CPU and Disks busy. So one should have at least as many handlers as CPU threads and disk spindles. Beyond that it is trial and error.
        We have 12 core CPU (24 HW threads) and 6 disk drives and have set the handler count to 50.

        Show
        Lars Hofhansl added a comment - You want to be able to keep both CPU and Disks busy. So one should have at least as many handlers as CPU threads and disk spindles. Beyond that it is trial and error. We have 12 core CPU (24 HW threads) and 6 disk drives and have set the handler count to 50.
        Hide
        Pablo Medina added a comment -

        So you can not handle more than 50 requests at a time?.

        Show
        Pablo Medina added a comment - So you can not handle more than 50 requests at a time?.
        Hide
        Elliott Clark added a comment -

        Requests are queued as they are decoded off the wire. SO you can have lots of requests coming in, however only 50 will be actively worked on at a time.

        Show
        Elliott Clark added a comment - Requests are queued as they are decoded off the wire. SO you can have lots of requests coming in, however only 50 will be actively worked on at a time.
        Hide
        Pablo Medina added a comment -

        Right. But if you have free cpu cycles and let's say that you have a high block cache hit ratio (almost all the data is in memory) you should consider increasing the handlers so you can use those free cycles to increase your performance, right?. I guess that 50 handlers in Lars scenario consumes almost all its cpu / disk bandwith.

        Show
        Pablo Medina added a comment - Right. But if you have free cpu cycles and let's say that you have a high block cache hit ratio (almost all the data is in memory) you should consider increasing the handlers so you can use those free cycles to increase your performance, right?. I guess that 50 handlers in Lars scenario consumes almost all its cpu / disk bandwith.
        Hide
        Lars Hofhansl added a comment -

        That's why you make sure to have at least as many handlers as CPU threads, and few more to handle io waits.
        Having way more threads than CPU threads and spindles is counter productive and you're better off queuing request.

        Show
        Lars Hofhansl added a comment - That's why you make sure to have at least as many handlers as CPU threads, and few more to handle io waits. Having way more threads than CPU threads and spindles is counter productive and you're better off queuing request.
        Hide
        Elliott Clark added a comment -

        So ycsb didn't show any change in time

        Show
        Elliott Clark added a comment - So ycsb didn't show any change in time
        Hide
        Lars Hofhansl added a comment -

        That's expected, no? YCSB is not running many long running scans, and I presume you didn't jack up the handler count.

        +1 from me.

        Show
        Lars Hofhansl added a comment - That's expected, no? YCSB is not running many long running scans, and I presume you didn't jack up the handler count. +1 from me.
        Hide
        Elliott Clark added a comment -

        I ran workload e which has some scanners, but you're probably correct about the length.

        Show
        Elliott Clark added a comment - I ran workload e which has some scanners, but you're probably correct about the length.
        Hide
        Pablo Medina added a comment -

        I ran this issue when asking concurrently the same keys. Looking at the stack trace It turns out that the bottleneck is on the Store level. So I guess that you should run some kind of test that retrieve under concurrency some set of keys belonging to the same Store. Does that make sense?

        Show
        Pablo Medina added a comment - I ran this issue when asking concurrently the same keys. Looking at the stack trace It turns out that the bottleneck is on the Store level. So I guess that you should run some kind of test that retrieve under concurrency some set of keys belonging to the same Store. Does that make sense?
        Hide
        Lars Hofhansl added a comment -

        Yeah it's per store, so you'd see this contention if you read a lot of KVs from the same Region and ColumnFamily.
        Now thinking about how this is used a bit more... We're using this to notify the scanners that they have to reset their KVHeap stack. In that we absolutely have to make sure that all currently open scanners do this. ConcurrentHashMap does not actually guarantee this upon interating, but CopyOnWriteArraySet does. So maybe we're opening ourselves up to concurrency issues. An alternative would be a to use a HashSet and synchronize on it.

        Show
        Lars Hofhansl added a comment - Yeah it's per store, so you'd see this contention if you read a lot of KVs from the same Region and ColumnFamily. Now thinking about how this is used a bit more... We're using this to notify the scanners that they have to reset their KVHeap stack. In that we absolutely have to make sure that all currently open scanners do this. ConcurrentHashMap does not actually guarantee this upon interating, but CopyOnWriteArraySet does. So maybe we're opening ourselves up to concurrency issues. An alternative would be a to use a HashSet and synchronize on it.
        Hide
        Lars Hofhansl added a comment -

        Thinking more. What exactly do we have to guarantee about this? When we call notifyChangedReadersObservers(), all we have to ensure that we see all observers that were added prior to this. So the guarantees provided by ConcurrentHashMap should be good enough after all.

        Show
        Lars Hofhansl added a comment - Thinking more. What exactly do we have to guarantee about this? When we call notifyChangedReadersObservers(), all we have to ensure that we see all observers that were added prior to this. So the guarantees provided by ConcurrentHashMap should be good enough after all.
        Hide
        Lars Hofhansl added a comment -

        I also tried a bunch of scenarios and could not find one where this improves performance.
        Pablo Medina, any chance to run your scenario with this patch applied?

        Show
        Lars Hofhansl added a comment - I also tried a bunch of scenarios and could not find one where this improves performance. Pablo Medina , any chance to run your scenario with this patch applied?
        Lars Hofhansl made changes -
        Fix Version/s 0.94.12 [ 12324790 ]
        Fix Version/s 0.94.11 [ 12324741 ]
        Priority Critical [ 2 ] Major [ 3 ]
        Hide
        Pablo Medina added a comment -

        I'll try to run it tomorrow. Where I can find the version with the patch applied?

        Show
        Pablo Medina added a comment - I'll try to run it tomorrow. Where I can find the version with the patch applied?
        Hide
        Lars Hofhansl added a comment -

        You'll have to build it yourself.
        If that is an issue, I can build one for you.

        Show
        Lars Hofhansl added a comment - You'll have to build it yourself. If that is an issue, I can build one for you.
        Hide
        Lars Hofhansl added a comment -

        I know Elliott asked that on the mailing list, just to be sure, are you closing your scanners on the client?

        Show
        Lars Hofhansl added a comment - I know Elliott asked that on the mailing list, just to be sure, are you closing your scanners on the client?
        Hide
        Pablo Medina added a comment -

        I'm not opening scanners at the client side. My use case involves a multiGet with 500 keys aprox. I noticed that each of those keys is handled a get and a subsequent Scanner on the server side. May I be overloading the server with too many scanners by using that multiGet case?

        Show
        Pablo Medina added a comment - I'm not opening scanners at the client side. My use case involves a multiGet with 500 keys aprox. I noticed that each of those keys is handled a get and a subsequent Scanner on the server side. May I be overloading the server with too many scanners by using that multiGet case?
        Hide
        Lars Hofhansl added a comment -

        Hmm... I tried that; interesting. That server should definitely be able to handle this.

        Might be easiest if you tried with the patch, Pablo.

        You can build HBase like this:

        1. svn checkout http://svn.apache.org/repos/asf/hbase/branches/0.94 hbase-0.94
        2. cd hbase-0.94
        3. download the patch here. Then patch -p0 < HBASE-9087-1.patch
        4. mvn clean install -DskipTests
        5. once this is done you'll fine the tarball in the target directory.

        I'm happy to put up a tarball on my private apache area.

        Show
        Lars Hofhansl added a comment - Hmm... I tried that; interesting. That server should definitely be able to handle this. Might be easiest if you tried with the patch, Pablo. You can build HBase like this: svn checkout http://svn.apache.org/repos/asf/hbase/branches/0.94 hbase-0.94 cd hbase-0.94 download the patch here. Then patch -p0 < HBASE-9087 -1.patch mvn clean install -DskipTests once this is done you'll fine the tarball in the target directory. I'm happy to put up a tarball on my private apache area.
        Hide
        Elliott Clark added a comment -

        I'll check this into trunk/95 then so that we can run integration tests for a while on it. Backporting should be pretty easy if needed Lars Hofhansl

        Show
        Elliott Clark added a comment - I'll check this into trunk/95 then so that we can run integration tests for a while on it. Backporting should be pretty easy if needed Lars Hofhansl
        Hide
        stack added a comment -

        +1 on commit to trunk and 0.95. Chatting w/ Elliott we could not see how CHM would give a different view than COW.

        Show
        stack added a comment - +1 on commit to trunk and 0.95. Chatting w/ Elliott we could not see how CHM would give a different view than COW.
        Hide
        Elliott Clark added a comment -

        In trunk and 95. Thanks for the reviews and discussions.

        Show
        Elliott Clark added a comment - In trunk and 95. Thanks for the reviews and discussions.
        Elliott Clark made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Resolution Fixed [ 1 ]
        Hide
        Lars Hofhansl added a comment -

        Meh... I'm just gonna commit this to 0.94 as well.

        Show
        Lars Hofhansl added a comment - Meh... I'm just gonna commit this to 0.94 as well.
        Hide
        Lars Hofhansl added a comment -

        Please don't mark an issue fixed if it has not been committed to all branches. Can either leave it open or remove the (in this case) the 0.94.11 tag.

        Show
        Lars Hofhansl added a comment - Please don't mark an issue fixed if it has not been committed to all branches. Can either leave it open or remove the (in this case) the 0.94.11 tag.
        Hide
        Lars Hofhansl added a comment -

        Committed to 0.94 as well.

        Show
        Lars Hofhansl added a comment - Committed to 0.94 as well.
        Lars Hofhansl made changes -
        Fix Version/s 0.94.11 [ 12324741 ]
        Fix Version/s 0.94.12 [ 12324790 ]
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in hbase-0.95-on-hadoop2 #215 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/215/)
        HBASE-9087 Handlers being blocked during reads (eclark: rev 1509887)

        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
        Show
        Hudson added a comment - SUCCESS: Integrated in hbase-0.95-on-hadoop2 #215 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/215/ ) HBASE-9087 Handlers being blocked during reads (eclark: rev 1509887) /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #650 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/650/)
        HBASE-9087 Handlers being blocked during reads (eclark: rev 1509886)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #650 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/650/ ) HBASE-9087 Handlers being blocked during reads (eclark: rev 1509886) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94-security #243 (See https://builds.apache.org/job/HBase-0.94-security/243/)
        HBASE-9087 Handlers being blocked during reads (Elliott) (larsh: rev 1509922)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.94-security #243 (See https://builds.apache.org/job/HBase-0.94-security/243/ ) HBASE-9087 Handlers being blocked during reads (Elliott) (larsh: rev 1509922) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94 #1092 (See https://builds.apache.org/job/HBase-0.94/1092/)
        HBASE-9087 Handlers being blocked during reads (Elliott) (larsh: rev 1509922)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.94 #1092 (See https://builds.apache.org/job/HBase-0.94/1092/ ) HBASE-9087 Handlers being blocked during reads (Elliott) (larsh: rev 1509922) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in hbase-0.95 #398 (See https://builds.apache.org/job/hbase-0.95/398/)
        HBASE-9087 Handlers being blocked during reads (eclark: rev 1509887)

        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
        Show
        Hudson added a comment - SUCCESS: Integrated in hbase-0.95 #398 (See https://builds.apache.org/job/hbase-0.95/398/ ) HBASE-9087 Handlers being blocked during reads (eclark: rev 1509887) /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK #4336 (See https://builds.apache.org/job/HBase-TRUNK/4336/)
        HBASE-9087 Handlers being blocked during reads (eclark: rev 1509886)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4336 (See https://builds.apache.org/job/HBase-TRUNK/4336/ ) HBASE-9087 Handlers being blocked during reads (eclark: rev 1509886) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
        Hide
        Pablo Medina added a comment -

        I tested the patch with my workload and it improved my response times in 10%. I'm not seeing the same rate of handlers blocked during my test as in 0.94.7. I'm wondering why you guys didn't see any improvement in your test cases. My test case consists in read 1.5 million keys per minute over 3 tables (1 cf per table). So I think this scenario generates too much pressure in the Stores opening scanners at the server side with the consequence of generating contention on that CopyOnWriteSet in 0.94.7. What do you guys think about it?

        Show
        Pablo Medina added a comment - I tested the patch with my workload and it improved my response times in 10%. I'm not seeing the same rate of handlers blocked during my test as in 0.94.7. I'm wondering why you guys didn't see any improvement in your test cases. My test case consists in read 1.5 million keys per minute over 3 tables (1 cf per table). So I think this scenario generates too much pressure in the Stores opening scanners at the server side with the consequence of generating contention on that CopyOnWriteSet in 0.94.7. What do you guys think about it?
        Hide
        Pablo Medina added a comment -

        When are you planning to release hbase 0.94.11 ?

        Show
        Pablo Medina added a comment - When are you planning to release hbase 0.94.11 ?
        Hide
        Lars Hofhansl added a comment -

        This week. 0.94.10RC0 was 7/19 and I am shooting for a monthly release.

        Show
        Lars Hofhansl added a comment - This week. 0.94.10RC0 was 7/19 and I am shooting for a monthly release.
        Lars Hofhansl made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Elliott Clark
            Reporter:
            Pablo Medina
          • Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development