HBase
  1. HBase
  2. HBASE-9869

Optimize HConnectionManager#getCachedLocation

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.98.0, 0.96.0
    • Fix Version/s: 0.98.0, 0.96.1
    • Component/s: Client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      It javadoc says: "TODO: This method during writing consumes 15% of CPU doing lookup". This is still true, says Yourkit. With 0.96, we also spend more time in these methods. We retry more, and the AsyncProcess calls it in parallel.

      I don't have the patch for this yet, but I will spend some time on it.

      1. 6869.v4.patch
        15 kB
        Nicolas Liochon
      2. 9869.v2.patch
        4 kB
        Nicolas Liochon
      3. 9869.v1.patch
        3 kB
        Nicolas Liochon
      4. 9869.v1.patch
        3 kB
        Nicolas Liochon

        Issue Links

          Activity

          Hide
          stack added a comment -

          Yeah, I have been seeing around 10-12%.

          Show
          stack added a comment - Yeah, I have been seeing around 10-12%.
          Hide
          stack added a comment -

          (/me ....channelling Todd Lipcon) ... caveat the fact that we are profiling using java profilers so actual cpu could be wildly different

          Show
          stack added a comment - (/me ....channelling Todd Lipcon ) ... caveat the fact that we are profiling using java profilers so actual cpu could be wildly different
          Hide
          Nicolas Liochon added a comment -

          Yeah, I agree. There is always a risk. Still the logic seems strange to me. I'm not sure the soft reference is still useful in 2013 . And removing the synchronized seems better. I'm looking for a quick hack to see if there is a difference.

          Show
          Nicolas Liochon added a comment - Yeah, I agree. There is always a risk. Still the logic seems strange to me. I'm not sure the soft reference is still useful in 2013 . And removing the synchronized seems better. I'm looking for a quick hack to see if there is a difference.
          Hide
          Nick Dimiduk added a comment -

          caveat the fact that we are profiling using java profilers

          I plan to try out an OpenJDK install + the lightweight-profiler this weekend, see if I can get it working with perfeval or similar.

          Show
          Nick Dimiduk added a comment - caveat the fact that we are profiling using java profilers I plan to try out an OpenJDK install + the lightweight-profiler this weekend, see if I can get it working with perfeval or similar.
          Hide
          Nicolas Liochon added a comment -

          On this one, from what I saw the measure was real, but it was also because we were creating too many objects. Now that we're in better shape, it should be less visible.

          This said, I feel that using a simple implementation without any weak/soft reference would be more efficient. That's what AsyncHBase is doing for example...

          Show
          Nicolas Liochon added a comment - On this one, from what I saw the measure was real, but it was also because we were creating too many objects. Now that we're in better shape, it should be less visible. This said, I feel that using a simple implementation without any weak/soft reference would be more efficient. That's what AsyncHBase is doing for example...
          Hide
          stack added a comment -

          See any difference? Whats the math like? How large is the Map if 1M regions in it? We remove when region is bad but ones we don't access could stick around for ever. I suppose that serves us right if we don't access them. Folks like Lars are sensitive about clients being well-behaved especially when embedded in an apps server. I looked around for a simple LRU – the guava one – but we need SortedMap.

          Show
          stack added a comment - See any difference? Whats the math like? How large is the Map if 1M regions in it? We remove when region is bad but ones we don't access could stick around for ever. I suppose that serves us right if we don't access them. Folks like Lars are sensitive about clients being well-behaved especially when embedded in an apps server. I looked around for a simple LRU – the guava one – but we need SortedMap.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12613359/9869.v1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestMultiParallel
          org.apache.hadoop.hbase.io.encoding.TestChangingEncoding
          org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
          org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.client.TestHCM
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting
          org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613359/9869.v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.io.encoding.TestChangingEncoding org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestHCM org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7842//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12613615/9869.v1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestMultiParallel
          org.apache.hadoop.hbase.io.encoding.TestChangingEncoding
          org.apache.hadoop.hbase.TestFullLogReconstruction
          org.apache.hadoop.hbase.replication.TestReplicationKillMasterRS
          org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
          org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS
          org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.client.TestHCM
          org.apache.hadoop.hbase.mapreduce.TestSecureLoadIncrementalHFilesSplitRecovery
          org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613615/9869.v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.io.encoding.TestChangingEncoding org.apache.hadoop.hbase.TestFullLogReconstruction org.apache.hadoop.hbase.replication.TestReplicationKillMasterRS org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.replication.TestReplicationKillSlaveRS org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestHCM org.apache.hadoop.hbase.mapreduce.TestSecureLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7845//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12613812/9869.v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613812/9869.v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7861//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          See any difference? Whats the math like? How large is the Map if 1M regions in it? We remove when region is bad but ones we don't access could stick around for ever. I suppose that serves us right if we don't access them. Folks like Lars are sensitive about clients being well-behaved especially when embedded in an apps server. I looked around for a simple LRU – the guava one – but we need SortedMap.

          My feeling is that if you have a table with 1 million regions you need to pay the price for this: i.e. have enough memory on the client.
          Let me do the math & test it however.
          We can as well do something in the middle: do a LRU on the first map, the one on the tableName.

          Show
          Nicolas Liochon added a comment - See any difference? Whats the math like? How large is the Map if 1M regions in it? We remove when region is bad but ones we don't access could stick around for ever. I suppose that serves us right if we don't access them. Folks like Lars are sensitive about clients being well-behaved especially when embedded in an apps server. I looked around for a simple LRU – the guava one – but we need SortedMap. My feeling is that if you have a table with 1 million regions you need to pay the price for this: i.e. have enough memory on the client. Let me do the math & test it however. We can as well do something in the middle: do a LRU on the first map, the one on the tableName.
          Hide
          Nicolas Liochon added a comment -

          I've done some tests with TestClientNoCluster with 10000 regions

          #clients #puts time without the patch time with the patch
          1 client 100 million 94 seconds 65 seconds
          2 clients 50 million each 82 seconds 56 seconds
          5 clients 20 million each 105 seconds 66 seconds

          With 5 clients, we have 10 threads trying to insert as much as possible, so more clients means more context switches on more memory pressure (it's different if they have to wait for an answer from a server of course).
          I need to do more tests with more regions. But so far so good I would say.

          Show
          Nicolas Liochon added a comment - I've done some tests with TestClientNoCluster with 10000 regions #clients #puts time without the patch time with the patch 1 client 100 million 94 seconds 65 seconds 2 clients 50 million each 82 seconds 56 seconds 5 clients 20 million each 105 seconds 66 seconds With 5 clients, we have 10 threads trying to insert as much as possible, so more clients means more context switches on more memory pressure (it's different if they have to wait for an answer from a server of course). I need to do more tests with more regions. But so far so good I would say.
          Hide
          stack added a comment -

          Patch should remove softsortedvaluemap too while you are at it.

          Numbers look good.

          On the middle approach, we'd drop unused tables. That could help.

          Anyway to get deepsize of the Map? If it were > some %age of the heap, we could just run a clean or purge of it as a precaution against leaks or OOME'ing on big cluster. Would that be hard to do?

          Show
          stack added a comment - Patch should remove softsortedvaluemap too while you are at it. Numbers look good. On the middle approach, we'd drop unused tables. That could help. Anyway to get deepsize of the Map? If it were > some %age of the heap, we could just run a clean or purge of it as a precaution against leaks or OOME'ing on big cluster. Would that be hard to do?
          Hide
          Nicolas Liochon added a comment -

          Yourkit calculates the retained size, i.e. Retained size of an object is its shallow size plus the shallow sizes of the objects that are accessible, directly or indirectly, only from this object. In other words, the retained size represents the amount of memory that will be freed by the garbage collector when this object is collected.
          With 10 thousand regions, the retained size of the two ConcurrentSkipListMap is 7 mega bytes.
          With 100 thousands regions, the retained size is 75 mega bytes. 19 megs are TableName objects, and this leads to an obvious optimization (I had it in mind already, to save on 'equals' but the final size is crazy). On the same range, we have 3.3 mb of ServerName.

          Lastly, I don't think that a Map is the best algorithm, a Trie would be much better. I will have a look at this as well.

          With 100k regions, time is:

          #clients #puts time without the patch time with the patch
          2 clients 50 million each 83 seconds 58 seconds

          With these results my opinion is that we should commit this patch as it is, because:

          • 60 Mb is acceptable for a client connected to a cluster with 100K regions.
          • In most cases, the weak reference will just make the performance unpredictable. The remaining cases (regions not used often so we can remove them under memory pressure) does not justify the noise for the other cases.
          • We can lower the memory foot print further if necessary, and it's likely a better solution than playing with the GC.
          Show
          Nicolas Liochon added a comment - Yourkit calculates the retained size, i.e. Retained size of an object is its shallow size plus the shallow sizes of the objects that are accessible, directly or indirectly, only from this object. In other words, the retained size represents the amount of memory that will be freed by the garbage collector when this object is collected. With 10 thousand regions, the retained size of the two ConcurrentSkipListMap is 7 mega bytes. With 100 thousands regions, the retained size is 75 mega bytes. 19 megs are TableName objects, and this leads to an obvious optimization (I had it in mind already, to save on 'equals' but the final size is crazy). On the same range, we have 3.3 mb of ServerName. Lastly, I don't think that a Map is the best algorithm, a Trie would be much better. I will have a look at this as well. With 100k regions, time is: #clients #puts time without the patch time with the patch 2 clients 50 million each 83 seconds 58 seconds With these results my opinion is that we should commit this patch as it is, because: 60 Mb is acceptable for a client connected to a cluster with 100K regions. In most cases, the weak reference will just make the performance unpredictable. The remaining cases (regions not used often so we can remove them under memory pressure) does not justify the noise for the other cases. We can lower the memory foot print further if necessary, and it's likely a better solution than playing with the GC.
          Hide
          Nicolas Liochon added a comment -

          I've tested v4 with 100K regions, on a scenario with random writes (and a different computer than the previous tests: the results are not directly comparable).

          #clients #puts time without the patch time with the patch
          2 clients 50 million each 114 seconds 72 seconds

          So there is a 65% gap between the two versions.

          Show
          Nicolas Liochon added a comment - I've tested v4 with 100K regions, on a scenario with random writes (and a different computer than the previous tests: the results are not directly comparable). #clients #puts time without the patch time with the patch 2 clients 50 million each 114 seconds 72 seconds So there is a 65% gap between the two versions.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12614243/6869.v4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 10 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614243/6869.v4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 10 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7898//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          +1

          Show
          Ted Yu added a comment - +1
          Hide
          stack added a comment -

          Ok. Sounds like it would take a good while for the client to fill up if there was a leak. Yeah, trie would be better but this is good for now. You opened an issue on interning TableNames. That patch looks good. We should open another for ServerName? Makes sense.

          Show
          stack added a comment - Ok. Sounds like it would take a good while for the client to fill up if there was a leak. Yeah, trie would be better but this is good for now. You opened an issue on interning TableNames. That patch looks good. We should open another for ServerName? Makes sense.
          Hide
          Nicolas Liochon added a comment -

          You opened an issue on interning TableNames. That patch looks good.

          I'm still on this one, to understand why the performances varied, and looking for the best solution. For example, the tableName is already in the regionName of the HRegionInfo. May be HRegionInfo should not have a reference to the TableName object, we don't need it that often.

          We should open another for ServerName? Makes sense.

          I still on this one as well . It's more complex, because the number of serverName is theoretically unbounded, as it includes a startCode.

          I've just committed the v4. I'm quite happy to have this part done...

          Show
          Nicolas Liochon added a comment - You opened an issue on interning TableNames. That patch looks good. I'm still on this one, to understand why the performances varied, and looking for the best solution. For example, the tableName is already in the regionName of the HRegionInfo. May be HRegionInfo should not have a reference to the TableName object, we don't need it that often. We should open another for ServerName? Makes sense. I still on this one as well . It's more complex, because the number of serverName is theoretically unbounded, as it includes a startCode. I've just committed the v4. I'm quite happy to have this part done...
          Hide
          Hudson added a comment -

          FAILURE: Integrated in hbase-0.96-hadoop2 #121 (See https://builds.apache.org/job/hbase-0.96-hadoop2/121/)
          HBASE-9869 Optimize HConnectionManager#getCachedLocation - remove SoftValueSortedMap (nkeywal: rev 1542698)

          • /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java
            HBASE-9869 Optimize HConnectionManager#getCachedLocation (nkeywal: rev 1542690)
          • /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          Show
          Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #121 (See https://builds.apache.org/job/hbase-0.96-hadoop2/121/ ) HBASE-9869 Optimize HConnectionManager#getCachedLocation - remove SoftValueSortedMap (nkeywal: rev 1542698) /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java HBASE-9869 Optimize HConnectionManager#getCachedLocation (nkeywal: rev 1542690) /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4683 (See https://builds.apache.org/job/HBase-TRUNK/4683/)
          HBASE-9869 Optimize HConnectionManager#getCachedLocation - remove SoftValueSortedMap (nkeywal: rev 1542699)

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java
            HBASE-9869 Optimize HConnectionManager#getCachedLocation (nkeywal: rev 1542688)
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4683 (See https://builds.apache.org/job/HBase-TRUNK/4683/ ) HBASE-9869 Optimize HConnectionManager#getCachedLocation - remove SoftValueSortedMap (nkeywal: rev 1542699) /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java HBASE-9869 Optimize HConnectionManager#getCachedLocation (nkeywal: rev 1542688) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in hbase-0.96 #192 (See https://builds.apache.org/job/hbase-0.96/192/)
          HBASE-9869 Optimize HConnectionManager#getCachedLocation - remove SoftValueSortedMap (nkeywal: rev 1542698)

          • /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java
            HBASE-9869 Optimize HConnectionManager#getCachedLocation (nkeywal: rev 1542690)
          • /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          Show
          Hudson added a comment - SUCCESS: Integrated in hbase-0.96 #192 (See https://builds.apache.org/job/hbase-0.96/192/ ) HBASE-9869 Optimize HConnectionManager#getCachedLocation - remove SoftValueSortedMap (nkeywal: rev 1542698) /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java HBASE-9869 Optimize HConnectionManager#getCachedLocation (nkeywal: rev 1542690) /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #840 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/840/)
          HBASE-9869 Optimize HConnectionManager#getCachedLocation - remove SoftValueSortedMap (nkeywal: rev 1542699)

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java
            HBASE-9869 Optimize HConnectionManager#getCachedLocation (nkeywal: rev 1542688)
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #840 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/840/ ) HBASE-9869 Optimize HConnectionManager#getCachedLocation - remove SoftValueSortedMap (nkeywal: rev 1542699) /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java HBASE-9869 Optimize HConnectionManager#getCachedLocation (nkeywal: rev 1542688) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          Hide
          stack added a comment -

          For example, the tableName is already in the regionName of the HRegionInfo. May be HRegionInfo should not have a reference to the TableName object, we don't need it that often.

          I thought HRI only had a table name, not a TableName? We did a load of work to make it so HRI didn't have a HTD; would be a shame to go back there. Good on you N.

          Show
          stack added a comment - For example, the tableName is already in the regionName of the HRegionInfo. May be HRegionInfo should not have a reference to the TableName object, we don't need it that often. I thought HRI only had a table name, not a TableName? We did a load of work to make it so HRI didn't have a HTD; would be a shame to go back there. Good on you N.
          Hide
          Lars Hofhansl added a comment -

          I have suggested changing this in the past, but the consensus used to be that we do not want to remove the soft value reference map (which would allow the GC to clean out entries from the map when memory is tight).

          I think soft references for caching are an anti-pattern (we're short of memory so we clear the cache in order to do more work if we need the cached data again)... But I wonder whether something has changed in the past year that makes that OK now.

          Anyway, +1 from my side.

          Show
          Lars Hofhansl added a comment - I have suggested changing this in the past, but the consensus used to be that we do not want to remove the soft value reference map (which would allow the GC to clean out entries from the map when memory is tight). I think soft references for caching are an anti-pattern (we're short of memory so we clear the cache in order to do more work if we need the cached data again)... But I wonder whether something has changed in the past year that makes that OK now. Anyway, +1 from my side.
          Hide
          stack added a comment -

          But I wonder whether something has changed in the past year that makes that OK now.

          We are wiser and see the wisdom of your ways now Lars Hofhansl! (smile). There is reluctance in evidence above still. I think the @nkeywal numbers that show how little memory is needed and then the work to make it smaller still helped the argument.

          Show
          stack added a comment - But I wonder whether something has changed in the past year that makes that OK now. We are wiser and see the wisdom of your ways now Lars Hofhansl ! (smile). There is reluctance in evidence above still. I think the @nkeywal numbers that show how little memory is needed and then the work to make it smaller still helped the argument.
          Hide
          Nicolas Liochon added a comment -

          we're short of memory so we clear the cache in order to do more work if we need the cached data again

          That's exactly that was happening, and that's why we went from 114s to 72s with the patch. I guess that protobuf added a lot of pressure on the GC, hence this behavior (but I've done the test with the latest trunk, it includes a lot of cleanup already).

          I think that the right pattern is to have the structure as small as possible, to make it fit into the processor cache. With all these references all over the place it's difficult to know exactly how it's going to behave.

          I thought HRI only had a table name, not a TableName? We did a load of work to make it so HRI didn't have a HTD; would be a shame to go back there

          There is an attribute "TableName" in HRI, and createRegionName copy the "table name" in another attribute "regionName": so we store it twice... I will see if I can remove it.

          Show
          Nicolas Liochon added a comment - we're short of memory so we clear the cache in order to do more work if we need the cached data again That's exactly that was happening, and that's why we went from 114s to 72s with the patch. I guess that protobuf added a lot of pressure on the GC, hence this behavior (but I've done the test with the latest trunk, it includes a lot of cleanup already). I think that the right pattern is to have the structure as small as possible, to make it fit into the processor cache. With all these references all over the place it's difficult to know exactly how it's going to behave. I thought HRI only had a table name, not a TableName? We did a load of work to make it so HRI didn't have a HTD; would be a shame to go back there There is an attribute "TableName" in HRI, and createRegionName copy the "table name" in another attribute "regionName": so we store it twice... I will see if I can remove it.
          Hide
          stack added a comment -

          Released in 0.96.1. Issue closed.

          Show
          stack added a comment - Released in 0.96.1. Issue closed.

            People

            • Assignee:
              Nicolas Liochon
              Reporter:
              Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development