HBase
  1. HBase
  2. HBASE-5349

Automagically tweak global memstore and block cache sizes based on workload

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.99.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Both memstore heap size and on heap block cache sizes can be tuned automatically within the RS life time. The algorithm to be used for this tuning is also pluggable.
      This feature adds the below new config
      "hbase.regionserver.global.memstore.size.max.range" and "hbase.regionserver.global.memstore.size.min.range" using which one can specify the total heap % within which the memstore size can vary.
      "hfile.block.cache.size.max.range" and "hfile.block.cache.size.min.range" using which one can specify the total heap % within which the block cache size can vary.
      Using "hbase.regionserver.heapmemory.tuner.class" one can plugin an impl for the tuner algorithm. Pass the FQCN of the tuner impl class which implements org.apache.hadoop.hbase.regionserver.HeapMemoryTuner
      The period within which the tuner checks for a possible tune can be adjusted with "hbase.regionserver.heapmemory.tuner.period". This defaults to 300000 (5 mins)
      The tuner algorithm receives a TunerContext in which we pass the number of block cache evictions (happened within that time) and the memstore flushes happened (Forced due to global heap pressure and normal flushes as separate items)

      Config changes
      ------------------
      Changed the config name "hbase.regionserver.global.memstore.upperLimit" to "hbase.regionserver.global.memstore.size"
      This will be the initial memstore size allocated (default value as 40%).
      Also changed "hbase.regionserver.global.memstore.lowerLimit" to "hbase.regionserver.global.memstore.size.lower.limit"
      There is a semantic change also here. "hbase.regionserver.global.memstore.lowerLimit" represented the total heap % upon which we start force flushes for memstores and defaults to 38% of total heap space.
      The new one is the % of memstore max heap size (Represented by hbase.regionserver.global.memstore.size). This defaults to 95% of hbase.regionserver.global.memstore.size value.
      Show
      Both memstore heap size and on heap block cache sizes can be tuned automatically within the RS life time. The algorithm to be used for this tuning is also pluggable. This feature adds the below new config "hbase.regionserver.global.memstore.size.max.range" and "hbase.regionserver.global.memstore.size.min.range" using which one can specify the total heap % within which the memstore size can vary. "hfile.block.cache.size.max.range" and "hfile.block.cache.size.min.range" using which one can specify the total heap % within which the block cache size can vary. Using "hbase.regionserver.heapmemory.tuner.class" one can plugin an impl for the tuner algorithm. Pass the FQCN of the tuner impl class which implements org.apache.hadoop.hbase.regionserver.HeapMemoryTuner The period within which the tuner checks for a possible tune can be adjusted with "hbase.regionserver.heapmemory.tuner.period". This defaults to 300000 (5 mins) The tuner algorithm receives a TunerContext in which we pass the number of block cache evictions (happened within that time) and the memstore flushes happened (Forced due to global heap pressure and normal flushes as separate items) Config changes ------------------ Changed the config name "hbase.regionserver.global.memstore.upperLimit" to "hbase.regionserver.global.memstore.size" This will be the initial memstore size allocated (default value as 40%). Also changed "hbase.regionserver.global.memstore.lowerLimit" to "hbase.regionserver.global.memstore.size.lower.limit" There is a semantic change also here. "hbase.regionserver.global.memstore.lowerLimit" represented the total heap % upon which we start force flushes for memstores and defaults to 38% of total heap space. The new one is the % of memstore max heap size (Represented by hbase.regionserver.global.memstore.size). This defaults to 95% of hbase.regionserver.global.memstore.size value.

      Description

      Hypertable does a neat thing where it changes the size given to the CellCache (our MemStores) and Block Cache based on the workload. If you need an image, scroll down at the bottom of this link: http://www.hypertable.com/documentation/architecture/

      That'd be one less thing to configure.

      1. HBASE-5349_V2.patch
        64 kB
        Anoop Sam John
      2. HBASE-5349_V3.patch
        64 kB
        Anoop Sam John
      3. HBASE-5349_V4.patch
        64 kB
        Anoop Sam John
      4. HBASE-5349_V5.patch
        64 kB
        Anoop Sam John
      5. WIP_HBASE-5349.patch
        37 kB
        Anoop Sam John

        Issue Links

          Activity

          Hide
          Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
          Hide
          Lars Hofhansl added a comment -

          +1 to always enable. This is important for HBase and that way we'll issues if any.

          Show
          Lars Hofhansl added a comment - +1 to always enable. This is important for HBase and that way we'll issues if any.
          Hide
          stack added a comment -

          Anoop Sam John I think we should enable by default, at least in master branch. What you think? A policy that sizes the blockcache so there are no evictions OR we hit maxsize would be sweet. I could add that.

          Show
          stack added a comment - Anoop Sam John I think we should enable by default, at least in master branch. What you think? A policy that sizes the blockcache so there are no evictions OR we hit maxsize would be sweet. I could add that.
          Hide
          stack added a comment -

          I think this feature should be on by default. If it is off no one will enable it because they'll be afraid of what it might do. It should be on because it takes away a need to twiddle knobs. Regards what happens to GC'ing profile when there is a seismic shift in size of blockcache/memstore, the point Liang Xie raises, its a valid concern; can we flag it as concern in doc ("If GC'ing sensitive...." or "If low-latency serving...") and release note it and then spend some time seeing how bad it is in action? Any way we could have a shift in config balance out? i.e. block cache allocations some subset of memstore allocations so no more fragmentation....

          Show
          stack added a comment - I think this feature should be on by default. If it is off no one will enable it because they'll be afraid of what it might do. It should be on because it takes away a need to twiddle knobs. Regards what happens to GC'ing profile when there is a seismic shift in size of blockcache/memstore, the point Liang Xie raises, its a valid concern; can we flag it as concern in doc ("If GC'ing sensitive...." or "If low-latency serving...") and release note it and then spend some time seeing how bad it is in action? Any way we could have a shift in config balance out? i.e. block cache allocations some subset of memstore allocations so no more fragmentation....
          Hide
          Liang Xie added a comment -

          Ok, my concern is gone now. thanks all for reply I haven't learned the code yet, but the idea/feature is pretty cool absolutely !

          Show
          Liang Xie added a comment - Ok, my concern is gone now. thanks all for reply I haven't learned the code yet, but the idea/feature is pretty cool absolutely !
          Hide
          Anoop Sam John added a comment -

          By default the auto tuning is turned off. One need to give the below 4 configs so as to define the range of heap %.
          "hbase.regionserver.global.memstore.size.max.range" and "hbase.regionserver.global.memstore.size.min.range" using which one can specify the total heap % within which the memstore size can vary.
          "hfile.block.cache.size.max.range" and "hfile.block.cache.size.min.range" using which one can specify the total heap % within which the block cache size can vary.
          There is no default values for these and so by default there wont be automatic tuning happening.

          Is that good enough Liang Xie

          Show
          Anoop Sam John added a comment - By default the auto tuning is turned off. One need to give the below 4 configs so as to define the range of heap %. "hbase.regionserver.global.memstore.size.max.range" and "hbase.regionserver.global.memstore.size.min.range" using which one can specify the total heap % within which the memstore size can vary. "hfile.block.cache.size.max.range" and "hfile.block.cache.size.min.range" using which one can specify the total heap % within which the block cache size can vary. There is no default values for these and so by default there wont be automatic tuning happening. Is that good enough Liang Xie
          Hide
          Andrew Purtell added a comment -

          if one production cluster has a lower enough 99th latency requirement considering gc factor, a null tuner probably must be provided?

          You bet Liang Xie, see HBASE-10151

          Show
          Andrew Purtell added a comment - if one production cluster has a lower enough 99th latency requirement considering gc factor, a null tuner probably must be provided? You bet Liang Xie , see HBASE-10151
          Hide
          Liang Xie added a comment -

          We can provide a null tuner to remove this as a factor when getting to the bottom of excessive GCs though.

          thanks Andrew, i definitely like this feature, and still, if one production cluster has a lower enough 99th latency requirement considering gc factor, a null tuner probably must be provided?

          Show
          Liang Xie added a comment - We can provide a null tuner to remove this as a factor when getting to the bottom of excessive GCs though. thanks Andrew, i definitely like this feature, and still, if one production cluster has a lower enough 99th latency requirement considering gc factor, a null tuner probably must be provided?
          Hide
          Andrew Purtell added a comment - - edited

          seems if we tweak block cache dynamically, it's more prone to trigger a gc that moment, right?

          This is largely out of our control, until and unless the JVM exposes some knobs for GC and early warning signals. Between memstore and blockcache we can't just look at JVM heap occupancy, we will be keeping it pretty full.

          We can provide a null tuner to remove this as a factor when getting to the bottom of excessive GCs though.

          Show
          Andrew Purtell added a comment - - edited seems if we tweak block cache dynamically, it's more prone to trigger a gc that moment, right? This is largely out of our control, until and unless the JVM exposes some knobs for GC and early warning signals. Between memstore and blockcache we can't just look at JVM heap occupancy, we will be keeping it pretty full. We can provide a null tuner to remove this as a factor when getting to the bottom of excessive GCs though.
          Hide
          Nick Dimiduk added a comment -

          Feedback Control seems like good reading.

          Show
          Nick Dimiduk added a comment - Feedback Control seems like good reading.
          Hide
          stack added a comment -

          seems if we tweak block cache dynamically, it's more prone to trigger a gc that moment, right?

          Makes sense. We will be disturbing whatever equilibrium that may have been in place. But if the workload has changed such that it has us changing configs., would that alone bring on a GC? I suppose we should have a high friction on the tweaks.

          Show
          stack added a comment - seems if we tweak block cache dynamically, it's more prone to trigger a gc that moment, right? Makes sense. We will be disturbing whatever equilibrium that may have been in place. But if the workload has changed such that it has us changing configs., would that alone bring on a GC? I suppose we should have a high friction on the tweaks.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4720 (See https://builds.apache.org/job/HBase-TRUNK/4720/)
          HBASE-5349 Automagically tweak global memstore and block cache sizes based on workload (anoopsamjohn: rev 1550059)

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/ResizableBlockCache.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultHeapMemoryTuner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequestListener.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemoryManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemoryTuner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreChunkPool.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHeapMemoryManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4720 (See https://builds.apache.org/job/HBase-TRUNK/4720/ ) HBASE-5349 Automagically tweak global memstore and block cache sizes based on workload (anoopsamjohn: rev 1550059) /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/ResizableBlockCache.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultHeapMemoryTuner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequestListener.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemoryManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemoryTuner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreChunkPool.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHeapMemoryManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Hide
          Liang Xie added a comment -

          seems if we tweak block cache dynamically, it's more prone to trigger a gc that moment, right?

          Show
          Liang Xie added a comment - seems if we tweak block cache dynamically, it's more prone to trigger a gc that moment, right?
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #4 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/4/)
          HBASE-5349 Automagically tweak global memstore and block cache sizes based on workload (anoopsamjohn: rev 1550059)

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/ResizableBlockCache.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultHeapMemoryTuner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequestListener.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemoryManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemoryTuner.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreChunkPool.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHeapMemoryManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #4 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/4/ ) HBASE-5349 Automagically tweak global memstore and block cache sizes based on workload (anoopsamjohn: rev 1550059) /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/ResizableBlockCache.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultHeapMemoryTuner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequestListener.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemoryManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemoryTuner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreChunkPool.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHeapMemoryManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          Hide
          Andrew Purtell added a comment -

          Nice work

          Show
          Andrew Purtell added a comment - Nice work
          Hide
          Anoop Sam John added a comment -

          Committed to Trunk. Thanks all for the reviews..

          Show
          Anoop Sam John added a comment - Committed to Trunk. Thanks all for the reviews..
          Hide
          ramkrishna.s.vasudevan added a comment -

          +1 on commit.

          Show
          ramkrishna.s.vasudevan added a comment - +1 on commit.
          Hide
          Lars Hofhansl added a comment -

          Nice milestone towards 1.0.
          +1

          Show
          Lars Hofhansl added a comment - Nice milestone towards 1.0. +1
          Hide
          stack added a comment -

          Anoop Sam John I am good on commit to trunk. Go for it.

          Show
          stack added a comment - Anoop Sam John I am good on commit to trunk. Go for it.
          Hide
          Anoop Sam John added a comment -

          The pending comment from Ram is a minor one. I will fix it on commit Ram.

          Show
          Anoop Sam John added a comment - The pending comment from Ram is a minor one. I will fix it on commit Ram.
          Hide
          Anoop Sam John added a comment -

          Thanks for the reviews. I have to make a next version addressing few comments from Ram. Sure Ted I will add the release notes. Any more comments stack ?

          Show
          Anoop Sam John added a comment - Thanks for the reviews. I have to make a next version addressing few comments from Ram. Sure Ted I will add the release notes. Any more comments stack ?
          Hide
          Ted Yu added a comment -

          Anoop:
          Can you update release notes ?

          Thanks

          Show
          Ted Yu added a comment - Anoop: Can you update release notes ? Thanks
          Hide
          Sergey Shelukhin added a comment -

          +1

          Show
          Sergey Shelukhin added a comment - +1
          Hide
          Andrew Purtell added a comment -

          +1 for trunk

          Show
          Andrew Purtell added a comment - +1 for trunk
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12618009/HBASE-5349_V5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618009/HBASE-5349_V5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8119//console This message is automatically generated.
          Hide
          Anoop Sam John added a comment -

          Fixing comments from Sergey

          Show
          Anoop Sam John added a comment - Fixing comments from Sergey
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12617789/HBASE-5349_V4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestSplitLogWorker

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617789/HBASE-5349_V4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitLogWorker Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8097//console This message is automatically generated.
          Hide
          Anoop Sam John added a comment -

          Correcting one javadoc warn in the patch.
          The test failure seems not related. It is fine locally !
          I can not see any new findbugs in this patched code. (In report) the increase in number may be because of some other recent commits?

          Show
          Anoop Sam John added a comment - Correcting one javadoc warn in the patch. The test failure seems not related. It is fine locally ! I can not see any new findbugs in this patched code. (In report) the increase in number may be because of some other recent commits?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12617779/HBASE-5349_V3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617779/HBASE-5349_V3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8095//console This message is automatically generated.
          Show
          Anoop Sam John added a comment - https://reviews.apache.org/r/14533/
          Hide
          Anoop Sam John added a comment -

          Will put in RB soon.

          Show
          Anoop Sam John added a comment - Will put in RB soon.
          Hide
          Anoop Sam John added a comment -

          I can see making the Interface public but do the implementations have to be public too?
          -public class LruBlockCache implements BlockCache, HeapSize {
          +public class LruBlockCache implements ResizableBlockCache, HeapSize {
          They are not instantiated outside of the hfile package?

          You mean LruBlockCache to be public or not right? It was already public. I can see it is being referred in some test cases outside hfile package

          Call it DefaultHeapMemoryBalancer instead of DefaultHeapMemoryBalancerImpl I would say.

          Done

          Nit: does the context need both blocked and unblocked? + boolean memstoreSufficient = blockedFlushCount == 0 && unblockedFlushCount == 0;

          The current impl of the Tuner does not distinguish this. But Why I added it as 2 counts is to support more Tuner impl work (later?) Right now when both memstore and block cache is under preassure we wont do any tuning. May be there also some tuning can be done if blockedFlushCount is large. Blocked flush blocks the writes to memstore.. Just to keep the door open I made this way

          You have this comment twice '// Increase the block cache size and corresponding decrease in memstore size' One must be wrong (smile)

          Copy paste.. Corrected

          Reading DefaultHeapMemoryBalancerImpl, we keep stepping w/o regard for a max. Is max enforced elsewhere? If so, will DefaultHeapMemoryBalancerImpl keep asking to step though we are against the max? Maybe this is fine. It makes the implementation 'easier' which is good. Should we log when we want to 'step'? Or is that done outside of DefaultHeapMemoryBalancerImpl (which would probably be better... again keep it simple)

          Yes I was intended to add check in HeapMemoryTuner#chore(). (The logging also). Adding it now. Also added check in DefaultHeapMemoryBalancer itself. When the HeapMemoryBalancer asks for step but the max limit is reached, I am making a warn log now. May be I can change this to INFO level IMO logging is enough no other action is needed. Any suggestion? ( This is there in Ted's comment also)

          Does the HeapMemoryManager have to have these two members?
          + private final MemStoreFlusher memStoreFlusher;
          + private final HRegionServer hrs;
          Can it not take Interface that has just what it needs? Else makes it hard to test this new code in isolation. At a minimum the HRS can be replaces by Service or RegionServerService Interface? Is that possible? And FlushRequester instead of MemStoreFlusher?

          Good point. Thanks I am making hrs to be of type Server which is enough.
          private final Server server;
          private final FlushRequester memStoreFlusher;

          Perhaps drop the 'Auto' prefix from AutoTunerContext and AutoTunerResult. The users of these Interfaces don't care if it Auto or not. Ditto here: HeapMemoryAutoTuner... call it HeapMemoryTuner.

          Done

          These should be private since server side?
          +@InterfaceAudience.Public
          +@InterfaceStability.Evolving
          +public interface HeapMemoryBalancer

          I wanted the impl of this balancer to be pluggable so that users can impl there own way like LoadBalancer. LoadBalancer marked as @InterfaceAudience.Public. Just followed that. As it is server side we need to make private? I am not sure of the guideline for this. If this is to be private I can change NP.

          Should there be a kill switch for the tuner? You pass in absolute values and once set, it stops balancing (for the case were the tuner turns pathological and an operator wants to turn it off while the server is online). We could do that in another issue np.

          Yes very much. This is needed. I have identified it as a subtask in my earlier comment (6. Support turn ON/OFF the memory tuner at run time.) This is not done yet. May be once this is done, as another issue or sub task this can be done.

          These do not need to be public classes + public static final class AutoTunerContext { and AutoTunerResult?

          This Context is passed to the HeapMemoryBalancer impl (which can be pluggable) and it needs to return the result. Making the class non public will force the impl can be to in the same package. Is that fine? That is why I kept it public

          This seems like informational rather than a warn?
          + LOG.warn("Current heap configuration from HeapMemoryBalancer exceeds "
          + + "the threshold required for successful cluster operation. "
          + + "The combined value cannot exceed 0.8. " + MemStoreFlusher.MEMSTORE_SIZE_KEY
          + + " is " + memstoreSize + " and " + HConstants.HFILE_BLOCK_CACHE_SIZE_KEY + " is "
          + + blockCacheSize);

          Fine. INFO

          Do you need to shut it down? + startHeapMemoryManager(); Or it doesn't matter?

          It doen't matter as of now as the balancer thread is daemon. But I will add shutdown also and call from HRS. May be latter when this is necessary it will a place holder to add those shutdown logic. You fine with such a place holder? (Only some log will be there as of now)

          Does the interface need to be public
          +public interface MemstoreFlushListener {

          Done

          Needs tests. Hopefully you can change the above references to MemstoreFlusher and RegionServer to be Interfaces so you do not need to spin up a cluster to test (you are almost there).

          Added tests

          I am a big fan of this patch.

          Me too wanted this feature. Atleast some people asked me regarding this. What they wanted was during some time, the cluster will be write heavy but no reads. And later only reads but no writes.

          Make one log rather than two since they change in lockstep:
          + LOG.info("Setting block cache heap size to " + newBlockCacheSize);
          ...
          + LOG.info("Setting memstore heap size to " + newMemstoreSize);

          Done

          Should there be a 'damping' facility? That is, should we run the check more often and only make changes if we have been called ten times and on 8 or the 10 times, we judged we should 'step'? That could be a different implementation I suppose? Or conversely, do you think there should be an 'emergency' chain that can be pulled when we need to change the configs now? (This latter is probably not a good idea – at least not yet).

          Thought abt that Stack. Some thing like a rule based decision making. The 1st impl wanted to make simple. That is why making the decision maker pluggable so that we can give better impls like what we have done in LoadBalancer.

          There're ^M's at the end of each line. Please generate patch on Linux.

          Hope it is fine now

          For HeapMemoryAutoTuner:
          + AutoTunerContext context = createTunerContext();^M
          Do we need to create new context for each iteration of the chore ? Can one instance be reused ?
          We need the the context as state var. Ya it should be okey. Done

          + if (1 - (memstoreSize + blockCacheSize) < HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD) {^M
          + LOG.warn("Current heap configuration from HeapMemoryBalancer exceeds "^M
          + + "the threshold required for successful cluster operation. "^M
          Should some action be taken for the above case ? Otherwise tuning is effectively disabled.

          We can not turn beyond this level as it will affect the normal functioning of the RS. What action you think? Actions like abort and all like over kill.

          + LOG.debug("From HeapMemoryBalancer new memstoreSize: " + memstoreSize + "%. new blockCacheSize: " + blockCacheSize + "%");^M
          I think the percent sign is not needed.

          Converting to % for correct comparison now. So adding % is fine?

          For DefaultHeapMemoryBalancerImpl, please add javadoc and audience annotation.

          Done

          + result.setMemstoreSize(context.getCurMemStoreSize() - step);

          Should we check that the decrement would not produce negative result ?
          Done.

          Show
          Anoop Sam John added a comment - I can see making the Interface public but do the implementations have to be public too? -public class LruBlockCache implements BlockCache, HeapSize { +public class LruBlockCache implements ResizableBlockCache, HeapSize { They are not instantiated outside of the hfile package? You mean LruBlockCache to be public or not right? It was already public. I can see it is being referred in some test cases outside hfile package Call it DefaultHeapMemoryBalancer instead of DefaultHeapMemoryBalancerImpl I would say. Done Nit: does the context need both blocked and unblocked? + boolean memstoreSufficient = blockedFlushCount == 0 && unblockedFlushCount == 0; The current impl of the Tuner does not distinguish this. But Why I added it as 2 counts is to support more Tuner impl work (later?) Right now when both memstore and block cache is under preassure we wont do any tuning. May be there also some tuning can be done if blockedFlushCount is large. Blocked flush blocks the writes to memstore.. Just to keep the door open I made this way You have this comment twice '// Increase the block cache size and corresponding decrease in memstore size' One must be wrong (smile) Copy paste.. Corrected Reading DefaultHeapMemoryBalancerImpl, we keep stepping w/o regard for a max. Is max enforced elsewhere? If so, will DefaultHeapMemoryBalancerImpl keep asking to step though we are against the max? Maybe this is fine. It makes the implementation 'easier' which is good. Should we log when we want to 'step'? Or is that done outside of DefaultHeapMemoryBalancerImpl (which would probably be better... again keep it simple) Yes I was intended to add check in HeapMemoryTuner#chore(). (The logging also). Adding it now. Also added check in DefaultHeapMemoryBalancer itself. When the HeapMemoryBalancer asks for step but the max limit is reached, I am making a warn log now. May be I can change this to INFO level IMO logging is enough no other action is needed. Any suggestion? ( This is there in Ted's comment also) Does the HeapMemoryManager have to have these two members? + private final MemStoreFlusher memStoreFlusher; + private final HRegionServer hrs; Can it not take Interface that has just what it needs? Else makes it hard to test this new code in isolation. At a minimum the HRS can be replaces by Service or RegionServerService Interface? Is that possible? And FlushRequester instead of MemStoreFlusher? Good point. Thanks I am making hrs to be of type Server which is enough. private final Server server; private final FlushRequester memStoreFlusher; Perhaps drop the 'Auto' prefix from AutoTunerContext and AutoTunerResult. The users of these Interfaces don't care if it Auto or not. Ditto here: HeapMemoryAutoTuner... call it HeapMemoryTuner. Done These should be private since server side? +@InterfaceAudience.Public +@InterfaceStability.Evolving +public interface HeapMemoryBalancer I wanted the impl of this balancer to be pluggable so that users can impl there own way like LoadBalancer. LoadBalancer marked as @InterfaceAudience.Public. Just followed that. As it is server side we need to make private? I am not sure of the guideline for this. If this is to be private I can change NP. Should there be a kill switch for the tuner? You pass in absolute values and once set, it stops balancing (for the case were the tuner turns pathological and an operator wants to turn it off while the server is online). We could do that in another issue np. Yes very much. This is needed. I have identified it as a subtask in my earlier comment (6. Support turn ON/OFF the memory tuner at run time.) This is not done yet. May be once this is done, as another issue or sub task this can be done. These do not need to be public classes + public static final class AutoTunerContext { and AutoTunerResult? This Context is passed to the HeapMemoryBalancer impl (which can be pluggable) and it needs to return the result. Making the class non public will force the impl can be to in the same package. Is that fine? That is why I kept it public This seems like informational rather than a warn? + LOG.warn("Current heap configuration from HeapMemoryBalancer exceeds " + + "the threshold required for successful cluster operation. " + + "The combined value cannot exceed 0.8. " + MemStoreFlusher.MEMSTORE_SIZE_KEY + + " is " + memstoreSize + " and " + HConstants.HFILE_BLOCK_CACHE_SIZE_KEY + " is " + + blockCacheSize); Fine. INFO Do you need to shut it down? + startHeapMemoryManager(); Or it doesn't matter? It doen't matter as of now as the balancer thread is daemon. But I will add shutdown also and call from HRS. May be latter when this is necessary it will a place holder to add those shutdown logic. You fine with such a place holder? (Only some log will be there as of now) Does the interface need to be public +public interface MemstoreFlushListener { Done Needs tests. Hopefully you can change the above references to MemstoreFlusher and RegionServer to be Interfaces so you do not need to spin up a cluster to test (you are almost there). Added tests I am a big fan of this patch. Me too wanted this feature. Atleast some people asked me regarding this. What they wanted was during some time, the cluster will be write heavy but no reads. And later only reads but no writes. Make one log rather than two since they change in lockstep: + LOG.info("Setting block cache heap size to " + newBlockCacheSize); ... + LOG.info("Setting memstore heap size to " + newMemstoreSize); Done Should there be a 'damping' facility? That is, should we run the check more often and only make changes if we have been called ten times and on 8 or the 10 times, we judged we should 'step'? That could be a different implementation I suppose? Or conversely, do you think there should be an 'emergency' chain that can be pulled when we need to change the configs now? (This latter is probably not a good idea – at least not yet). Thought abt that Stack. Some thing like a rule based decision making. The 1st impl wanted to make simple. That is why making the decision maker pluggable so that we can give better impls like what we have done in LoadBalancer. There're ^M's at the end of each line. Please generate patch on Linux. Hope it is fine now For HeapMemoryAutoTuner: + AutoTunerContext context = createTunerContext();^M Do we need to create new context for each iteration of the chore ? Can one instance be reused ? We need the the context as state var. Ya it should be okey. Done + if (1 - (memstoreSize + blockCacheSize) < HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD) {^M + LOG.warn("Current heap configuration from HeapMemoryBalancer exceeds "^M + + "the threshold required for successful cluster operation. "^M Should some action be taken for the above case ? Otherwise tuning is effectively disabled. We can not turn beyond this level as it will affect the normal functioning of the RS. What action you think? Actions like abort and all like over kill. + LOG.debug("From HeapMemoryBalancer new memstoreSize: " + memstoreSize + "%. new blockCacheSize: " + blockCacheSize + "%");^M I think the percent sign is not needed. Converting to % for correct comparison now. So adding % is fine? For DefaultHeapMemoryBalancerImpl, please add javadoc and audience annotation. Done + result.setMemstoreSize(context.getCurMemStoreSize() - step); Should we check that the decrement would not produce negative result ? Done.
          Hide
          Anoop Sam John added a comment -

          Patch addressing comments

          Show
          Anoop Sam John added a comment - Patch addressing comments
          Hide
          Anoop Sam John added a comment -

          Stack & Ted thanks for the review.. Working on the comments and also adding tests. Will update the patch soon.

          Show
          Anoop Sam John added a comment - Stack & Ted thanks for the review.. Working on the comments and also adding tests. Will update the patch soon.
          Hide
          stack added a comment -

          I can see making the Interface public but do the implementations have to be public too?

          -public class LruBlockCache implements BlockCache, HeapSize {
          +public class LruBlockCache implements ResizableBlockCache, HeapSize {

          They are not instantiated outside of the hfile package?

          Call it DefaultHeapMemoryBalancer instead of DefaultHeapMemoryBalancerImpl I would say.

          It looks like you change configuration names but you handle the case where users still have the old names; that is good.

          Nit: does the context need both blocked and unblocked? + boolean memstoreSufficient = blockedFlushCount == 0 && unblockedFlushCount == 0;

          You have this comment twice '// Increase the block cache size and corresponding decrease in memstore size' One must be wrong (smile)

          Reading DefaultHeapMemoryBalancerImpl, we keep stepping w/o regard for a max. Is max enforced elsewhere? If so, will DefaultHeapMemoryBalancerImpl keep asking to step though we are against the max? Maybe this is fine. It makes the implementation 'easier' which is good. Should we log when we want to 'step'? Or is that done outside of DefaultHeapMemoryBalancerImpl (which would probably be better... again keep it simple)

          Does the HeapMemoryManager have to have these two members?

          + private final MemStoreFlusher memStoreFlusher;
          + private final HRegionServer hrs;

          Can it not take Interface that has just what it needs? Else makes it hard to test this new code in isolation. At a minimum the HRS can be replaces by Service or RegionServerService Interface? Is that possible? And FlushRequester instead of MemStoreFlusher?

          Perhaps drop the 'Auto' prefix from AutoTunerContext and AutoTunerResult. The users of these Interfaces don't care if it Auto or not. Ditto here: HeapMemoryAutoTuner... call it HeapMemoryTuner.

          These should be private since server side?

          +@InterfaceAudience.Public
          +@InterfaceStability.Evolving
          +public interface HeapMemoryBalancer

          Should there be a kill switch for the tuner? You pass in absolute values and once set, it stops balancing (for the case were the tuner turns pathological and an operator wants to turn it off while the server is online). We could do that in another issue np.

          Should there be a 'damping' facility? That is, should we run the check more often and only make changes if we have been called ten times and on 8 or the 10 times, we judged we should 'step'? That could be a different implementation I suppose? Or conversely, do you think there should be an 'emergency' chain that can be pulled when we need to change the configs now? (This latter is probably not a good idea – at least not yet).

          We need to get rid of Chore and have one thread only that does all these tasks – we have a load of them running now on each server – or do them via executor... but that is out of scope of this issue.

          These do not need to be public classes + public static final class AutoTunerContext { and AutoTunerResult?

          This seems like informational rather than a warn?

          + LOG.warn("Current heap configuration from HeapMemoryBalancer exceeds "
          + + "the threshold required for successful cluster operation. "
          + + "The combined value cannot exceed 0.8. " + MemStoreFlusher.MEMSTORE_SIZE_KEY
          + + " is " + memstoreSize + " and " + HConstants.HFILE_BLOCK_CACHE_SIZE_KEY + " is "
          + + blockCacheSize);

          Make one log rather than two since they change in lockstep:

          + LOG.info("Setting block cache heap size to " + newBlockCacheSize);

          ...

          + LOG.info("Setting memstore heap size to " + newMemstoreSize);

          Do you need to shut it down? + startHeapMemoryManager(); Or it doesn't matter?

          Does the interface need to be public

          +public interface MemstoreFlushListener {

          (Lars Francke went through and cleaned the public from all our Interface methods... etc., so would be nice not to undo his work).

          Needs tests. Hopefully you can change the above references to MemstoreFlusher and RegionServer to be Interfaces so you do not need to spin up a cluster to test (you are almost there).

          I am a big fan of this patch. Good work Anoop. Thanks for doing this.

          Show
          stack added a comment - I can see making the Interface public but do the implementations have to be public too? -public class LruBlockCache implements BlockCache, HeapSize { +public class LruBlockCache implements ResizableBlockCache, HeapSize { They are not instantiated outside of the hfile package? Call it DefaultHeapMemoryBalancer instead of DefaultHeapMemoryBalancerImpl I would say. It looks like you change configuration names but you handle the case where users still have the old names; that is good. Nit: does the context need both blocked and unblocked? + boolean memstoreSufficient = blockedFlushCount == 0 && unblockedFlushCount == 0; You have this comment twice '// Increase the block cache size and corresponding decrease in memstore size' One must be wrong (smile) Reading DefaultHeapMemoryBalancerImpl, we keep stepping w/o regard for a max. Is max enforced elsewhere? If so, will DefaultHeapMemoryBalancerImpl keep asking to step though we are against the max? Maybe this is fine. It makes the implementation 'easier' which is good. Should we log when we want to 'step'? Or is that done outside of DefaultHeapMemoryBalancerImpl (which would probably be better... again keep it simple) Does the HeapMemoryManager have to have these two members? + private final MemStoreFlusher memStoreFlusher; + private final HRegionServer hrs; Can it not take Interface that has just what it needs? Else makes it hard to test this new code in isolation. At a minimum the HRS can be replaces by Service or RegionServerService Interface? Is that possible? And FlushRequester instead of MemStoreFlusher? Perhaps drop the 'Auto' prefix from AutoTunerContext and AutoTunerResult. The users of these Interfaces don't care if it Auto or not. Ditto here: HeapMemoryAutoTuner... call it HeapMemoryTuner. These should be private since server side? +@InterfaceAudience.Public +@InterfaceStability.Evolving +public interface HeapMemoryBalancer Should there be a kill switch for the tuner? You pass in absolute values and once set, it stops balancing (for the case were the tuner turns pathological and an operator wants to turn it off while the server is online). We could do that in another issue np. Should there be a 'damping' facility? That is, should we run the check more often and only make changes if we have been called ten times and on 8 or the 10 times, we judged we should 'step'? That could be a different implementation I suppose? Or conversely, do you think there should be an 'emergency' chain that can be pulled when we need to change the configs now? (This latter is probably not a good idea – at least not yet). We need to get rid of Chore and have one thread only that does all these tasks – we have a load of them running now on each server – or do them via executor... but that is out of scope of this issue. These do not need to be public classes + public static final class AutoTunerContext { and AutoTunerResult? This seems like informational rather than a warn? + LOG.warn("Current heap configuration from HeapMemoryBalancer exceeds " + + "the threshold required for successful cluster operation. " + + "The combined value cannot exceed 0.8. " + MemStoreFlusher.MEMSTORE_SIZE_KEY + + " is " + memstoreSize + " and " + HConstants.HFILE_BLOCK_CACHE_SIZE_KEY + " is " + + blockCacheSize); Make one log rather than two since they change in lockstep: + LOG.info("Setting block cache heap size to " + newBlockCacheSize); ... + LOG.info("Setting memstore heap size to " + newMemstoreSize); Do you need to shut it down? + startHeapMemoryManager(); Or it doesn't matter? Does the interface need to be public +public interface MemstoreFlushListener { (Lars Francke went through and cleaned the public from all our Interface methods... etc., so would be nice not to undo his work). Needs tests. Hopefully you can change the above references to MemstoreFlusher and RegionServer to be Interfaces so you do not need to spin up a cluster to test (you are almost there). I am a big fan of this patch. Good work Anoop. Thanks for doing this.
          Hide
          Ted Yu added a comment - - edited

          There're ^M's at the end of each line. Please generate patch on Linux.
          It would be nice to put the patch on review board.

          For HeapMemoryAutoTuner:

          +      AutoTunerContext context = createTunerContext();^M
          

          Do we need to create new context for each iteration of the chore ? Can one instance be reused ?

          +        LOG.debug("From HeapMemoryBalancer new memstoreSize: " + memstoreSize + "%. new blockCacheSize: " + blockCacheSize + "%");^M
          

          I think the percent sign is not needed.

          +        if (1 - (memstoreSize + blockCacheSize) < HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD) {^M
          +          LOG.warn("Current heap configuration from HeapMemoryBalancer exceeds "^M
          +              + "the threshold required for successful cluster operation. "^M
          

          Should some action be taken for the above case ? Otherwise tuning is effectively disabled.

          Since memstoreSize and blockCacheSize are local variables, I would expect some action when result.needsTuning() returns true.

          For DefaultHeapMemoryBalancerImpl, please add javadoc and audience annotation.

          +      result.setMemstoreSize(context.getCurMemStoreSize() - step);^M
          

          Should we check that the decrement would not produce negative result ?

          Please add unit tests for the new classes.

          Show
          Ted Yu added a comment - - edited There're ^M's at the end of each line. Please generate patch on Linux. It would be nice to put the patch on review board. For HeapMemoryAutoTuner: + AutoTunerContext context = createTunerContext();^M Do we need to create new context for each iteration of the chore ? Can one instance be reused ? + LOG.debug( "From HeapMemoryBalancer new memstoreSize: " + memstoreSize + "%. new blockCacheSize: " + blockCacheSize + "%" );^M I think the percent sign is not needed. + if (1 - (memstoreSize + blockCacheSize) < HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD) {^M + LOG.warn( "Current heap configuration from HeapMemoryBalancer exceeds " ^M + + "the threshold required for successful cluster operation. " ^M Should some action be taken for the above case ? Otherwise tuning is effectively disabled. Since memstoreSize and blockCacheSize are local variables, I would expect some action when result.needsTuning() returns true. For DefaultHeapMemoryBalancerImpl, please add javadoc and audience annotation. + result.setMemstoreSize(context.getCurMemStoreSize() - step);^M Should we check that the decrement would not produce negative result ? Please add unit tests for the new classes.
          Hide
          Anoop Sam John added a comment - - edited

          Ted
          Other than 6 all are addressed.
          Thanks

          Show
          Anoop Sam John added a comment - - edited Ted Other than 6 all are addressed. Thanks
          Hide
          Ted Yu added a comment -

          There were 6 subtasks identified above.

          How many were addressed by the WIP patch ?

          Thanks

          Show
          Ted Yu added a comment - There were 6 subtasks identified above. How many were addressed by the WIP patch ? Thanks
          Hide
          Anoop Sam John added a comment -

          WIP patch. Review on the direction and framework welcome.

          Show
          Anoop Sam John added a comment - WIP patch. Review on the direction and framework welcome.
          Hide
          Anoop Sam John added a comment -

          Subtask identified as of now
          1. Support MemstoreFLush listener
          Listeners should be able to be registered. When the flush happens, the listeners should get notified. The notification may include on which region the flush happened and the flush type. The types can be like whether a normal flush or a flush due to global pressure (unblocked or blocked)

          2. Support resizable BlockCache
          Resizing can be supported for all BlockCache types which are on heap cache. LRUBlockCache is onheap type. Also DoubleBlockCache having onheap LRUCache along with off heap cache. So DoubleBlockCache also can be resizable.

          3. Support specifying global memstore lower watermark as a % of global memstore upper limit
          Now memstore lower watermark is specified as a % of the max heap size and defaults to 38%.(Default upper water mark memstore size is 40%) We need to change this behavior so as to specify lower water mark as a % of the upper water mark limit. (The default can be 95% (ie. 0.38/0.4 * 100)

          4. Support size range for memstore size and block cache size
          Support min and max values for memstore size and block cache size. The auto tuner can adjust the cache size to a value in btw these ranges.

          5. Support pluggable auto tuner
          There should be an interface based auto tuner impl. Configuring the FQCN of the impl in hbase-site.xml allows user to provide a custom impl for the auto tuner. Also provide a default impl which will decide the cache sizes based on the memstore flushes (due to global heap pressure) and block cache evictions

          6. Support turn ON/OFF the memory tuner at run time.

          Show
          Anoop Sam John added a comment - Subtask identified as of now 1. Support MemstoreFLush listener Listeners should be able to be registered. When the flush happens, the listeners should get notified. The notification may include on which region the flush happened and the flush type. The types can be like whether a normal flush or a flush due to global pressure (unblocked or blocked) 2. Support resizable BlockCache Resizing can be supported for all BlockCache types which are on heap cache. LRUBlockCache is onheap type. Also DoubleBlockCache having onheap LRUCache along with off heap cache. So DoubleBlockCache also can be resizable. 3. Support specifying global memstore lower watermark as a % of global memstore upper limit Now memstore lower watermark is specified as a % of the max heap size and defaults to 38%.(Default upper water mark memstore size is 40%) We need to change this behavior so as to specify lower water mark as a % of the upper water mark limit. (The default can be 95% (ie. 0.38/0.4 * 100) 4. Support size range for memstore size and block cache size Support min and max values for memstore size and block cache size. The auto tuner can adjust the cache size to a value in btw these ranges. 5. Support pluggable auto tuner There should be an interface based auto tuner impl. Configuring the FQCN of the impl in hbase-site.xml allows user to provide a custom impl for the auto tuner. Also provide a default impl which will decide the cache sizes based on the memstore flushes (due to global heap pressure) and block cache evictions 6. Support turn ON/OFF the memory tuner at run time.
          Hide
          Anoop Sam John added a comment -

          Configurations
          --------------
          For on heap block cache there is a config value. ie. what is the max heap space and defaults to 40%. Now we need to have a config item to read a min heap space for block cache. In fact this is not the min heap space which at least will be always reserved for the block cache. This is the min value the auto tuner can turn down the block cache size up to.

          hfile.block.cache.size -> Current config to specify the Percentage of maximum heap (-Xmx setting) to allocate to block cache
          Now we need 2 more configs to specify a range for this above value. And the above config item will continue to be there.
          hfile.block.cache.size.max.range
          hfile.block.cache.size.min.range
          By default when HBase starts, the block cache will be allocated a size of hfile.block.cache.size (As current way). Later the auto tuner can change this btw the max range and min range values.
          If hfile.block.cache.size.max.range=hfile.block.cache.size.min.range=hfile.block.cache.size auto tuning for block cache will be turned off. (And thus for memstore also)

          Memstore having a higher and lower water mark now. Higher water mark is a max heap size now itself. But lower water mark is not. This is the heap size when it reached, we will start flushes to prevent the memstore reaching the higher water mark.
          Once the memstore size reaches higher water mark, we will block all updates.
          Now we need a min heap size also
          The current configs are hbase.regionserver.global.memstore.upperLimit and hbase.regionserver.global.memstore.lowerLimit
          We need to introduce 2 new configs hbase.regionserver.global.memstore.size.max.range and hbase.regionserver.global.memstore.size.min.range Also we can rename hbase.regionserver.global.memstore.upperLimit to hbase.regionserver.global.memstore.size
          By default when HBase starts it will allocate a global memstore size = hbase.regionserver.global.memstore.size as now (BC with old config name also will be provided). The auto tuner can change this btw the max and min range.

          And the lower watermark can no longer be a % of the total heap size. This needs to be a % of the max heap size(Higher water mark ie. hbase.regionserver.global.memstore.size). At any point in time the actual heap space will be having some value btw max and min range. The lower water mark % should result to a heap space lesser than the actual max heap space for global memstore and we should start the flushes at this point of time (as we do today).
          So we should rename this config name also. -> hbase.regionserver.global.memstore.size.lower.limit -> % of the hbase.regionserver.global.memstore.size at which we will start flushes to avoid stop the world flushes. For BC we will continue to support hbase.regionserver.global.memstore.lowerLimit which is a % of the total JVM heap space. On init we can convert this value to a % of the hbase.regionserver.global.memstore.size. The default value for this lower limit can be 95% of the global memstore size. (Default values for hbase.regionserver.global.memstore.upperLimit and hbase.regionserver.global.memstore.lowerLimit are 40% and 38% respectively)

          Init check now is block cache size + higher water mark <=80%
          This should become
          hbase.regionserver.global.memstore.size + hfile.block.cache.size <= 80%
          AND
          hbase.regionserver.global.memstore.size.max.range + hfile.block.cache.size.min.range <= 80%
          AND
          hbase.regionserver.global.memstore.size.min.range + hfile.block.cache.size.max.range <= 80%
          At any time the heap size of the block cache + that of memstore <=80%

          We can have a HeapMemoryManager
          On HRS start, this also can be started. This Manager can track the block cache evictions (Can get from CacheStats). Also the memstore flushes can be tracked.
          MemstoreFlusher should support changing the global memstore size. Also the BlockCache can be resizable. This manager should support a pluggable HeapMemoryBalancer using which the new value for memstore and block cache sizes can be determined.
          When there is a change in the heap size, those can be set on MemstoreFlusher/BlockCache

          HeapMemoryBalancer
          A custom implementation for this can be plugged using config param. "hbase.regionserver.memory.balancer.class"
          The default impl can do the checks for the memory adjustment by comparing the block cache evictions against the flushes for memstores due to global heap pressure.
          Normal flushes due to one memstore reaching the flush size is a normal op and should not get accounted in the above checks. Only the flushes because of global heap pressure (blocked or no blocked) should get accounted
          Also a configurable time interval in which this tuner will check the condition. The config name can be "hbase.regionserver.heapmemory.autotuner.period" which default to 5 mns.
          Depending on the check it can increase/decrease the size of block cache or memstore global size in steps. The step value can be something like 2% of –Xmx. This also can be configurable.

          There should be way to support MemoryTuner to turn ON/OFF the tuner facility.

          Show
          Anoop Sam John added a comment - Configurations -------------- For on heap block cache there is a config value. ie. what is the max heap space and defaults to 40%. Now we need to have a config item to read a min heap space for block cache. In fact this is not the min heap space which at least will be always reserved for the block cache. This is the min value the auto tuner can turn down the block cache size up to. hfile.block.cache.size -> Current config to specify the Percentage of maximum heap (-Xmx setting) to allocate to block cache Now we need 2 more configs to specify a range for this above value. And the above config item will continue to be there. hfile.block.cache.size.max.range hfile.block.cache.size.min.range By default when HBase starts, the block cache will be allocated a size of hfile.block.cache.size (As current way). Later the auto tuner can change this btw the max range and min range values. If hfile.block.cache.size.max.range=hfile.block.cache.size.min.range=hfile.block.cache.size auto tuning for block cache will be turned off. (And thus for memstore also) Memstore having a higher and lower water mark now. Higher water mark is a max heap size now itself. But lower water mark is not. This is the heap size when it reached, we will start flushes to prevent the memstore reaching the higher water mark. Once the memstore size reaches higher water mark, we will block all updates. Now we need a min heap size also The current configs are hbase.regionserver.global.memstore.upperLimit and hbase.regionserver.global.memstore.lowerLimit We need to introduce 2 new configs hbase.regionserver.global.memstore.size.max.range and hbase.regionserver.global.memstore.size.min.range Also we can rename hbase.regionserver.global.memstore.upperLimit to hbase.regionserver.global.memstore.size By default when HBase starts it will allocate a global memstore size = hbase.regionserver.global.memstore.size as now (BC with old config name also will be provided). The auto tuner can change this btw the max and min range. And the lower watermark can no longer be a % of the total heap size. This needs to be a % of the max heap size(Higher water mark ie. hbase.regionserver.global.memstore.size). At any point in time the actual heap space will be having some value btw max and min range. The lower water mark % should result to a heap space lesser than the actual max heap space for global memstore and we should start the flushes at this point of time (as we do today). So we should rename this config name also. -> hbase.regionserver.global.memstore.size.lower.limit -> % of the hbase.regionserver.global.memstore.size at which we will start flushes to avoid stop the world flushes. For BC we will continue to support hbase.regionserver.global.memstore.lowerLimit which is a % of the total JVM heap space. On init we can convert this value to a % of the hbase.regionserver.global.memstore.size. The default value for this lower limit can be 95% of the global memstore size. (Default values for hbase.regionserver.global.memstore.upperLimit and hbase.regionserver.global.memstore.lowerLimit are 40% and 38% respectively) Init check now is block cache size + higher water mark <=80% This should become hbase.regionserver.global.memstore.size + hfile.block.cache.size <= 80% AND hbase.regionserver.global.memstore.size.max.range + hfile.block.cache.size.min.range <= 80% AND hbase.regionserver.global.memstore.size.min.range + hfile.block.cache.size.max.range <= 80% At any time the heap size of the block cache + that of memstore <=80% We can have a HeapMemoryManager On HRS start, this also can be started. This Manager can track the block cache evictions (Can get from CacheStats). Also the memstore flushes can be tracked. MemstoreFlusher should support changing the global memstore size. Also the BlockCache can be resizable. This manager should support a pluggable HeapMemoryBalancer using which the new value for memstore and block cache sizes can be determined. When there is a change in the heap size, those can be set on MemstoreFlusher/BlockCache HeapMemoryBalancer A custom implementation for this can be plugged using config param. "hbase.regionserver.memory.balancer.class" The default impl can do the checks for the memory adjustment by comparing the block cache evictions against the flushes for memstores due to global heap pressure. Normal flushes due to one memstore reaching the flush size is a normal op and should not get accounted in the above checks. Only the flushes because of global heap pressure (blocked or no blocked) should get accounted Also a configurable time interval in which this tuner will check the condition. The config name can be "hbase.regionserver.heapmemory.autotuner.period" which default to 5 mns. Depending on the check it can increase/decrease the size of block cache or memstore global size in steps. The step value can be something like 2% of –Xmx. This also can be configurable. There should be way to support MemoryTuner to turn ON/OFF the tuner facility.
          Hide
          Anoop Sam John added a comment -

          Working on a proto type now. Will come up with the low level impl details soon.

          Show
          Anoop Sam John added a comment - Working on a proto type now. Will come up with the low level impl details soon.
          Hide
          Anoop Sam John added a comment -

          Looks to be great for us. Will start working on this.

          Show
          Anoop Sam John added a comment - Looks to be great for us. Will start working on this.
          Hide
          Jean-Daniel Cryans added a comment -

          If anyone is looking for a good jira to solve, this is one.

          Show
          Jean-Daniel Cryans added a comment - If anyone is looking for a good jira to solve, this is one.
          Hide
          stack added a comment -

          Moving out of 0.96.0

          Show
          stack added a comment - Moving out of 0.96.0
          Hide
          stack added a comment -

          I wouldn't mind more detail.

          Can our LRU be resized?

          Memstore upper bound can vary but there are interesting effects like if its too big, flushing can take so long, the memstore fills before we get around to flushing it again so we block.

          Nit: 10 minutes seems like too coarse a granularity?

          Good stuff Enis.

          Show
          stack added a comment - I wouldn't mind more detail. Can our LRU be resized? Memstore upper bound can vary but there are interesting effects like if its too big, flushing can take so long, the memstore fills before we get around to flushing it again so we block. Nit: 10 minutes seems like too coarse a granularity? Good stuff Enis.
          Hide
          Ted Yu added a comment -

          This is a plan worth pursuing.

          Show
          Ted Yu added a comment - This is a plan worth pursuing.
          Hide
          Enis Soztutar added a comment -

          I have been thinking about this, and I think we can have a shot at a simple implementation. Let me summarize what I have in mind before starting the implementation:
          Goals:

          • Provide min - max heap percentages for block cache (memstore kind of has it). I think we should keep max-min sanity bounds, and if they are equal, disable auto-tuning.
          • enable optimizing the available memory for adaptive workloads (mostly writes during the day, a lot of reads once MR job starts, etc). For example, when a large write job is started after ~10 minutes, region servers should tune for write workload.
            Non-goals:
          • find the optimum mem-utilization algorithm
          • introduce a bunch of other parameters, to get rid of the current ones
          • make it very experimental so that nobody enables it in production.

          Ideally, to optimize the usage of the available memory, we should predict the future workload (possibly from past workload), and devise a model capturing all the costs associated with block cache hits / misses, flushes, compactions, etc. But this model will be very complex to do it properly.

          I have checked Hypertable's implementation, and it seems that they check whether the load is read/write heavy by some hard coded values for the counters, and increment/decrement the mem limits, much like what Zhihong proposes above. I also want to start with something similar.

          Implementation layer:

          • Currently global memstore limit is a soft limit, we may have to make it a hard limit (blocking writes)
          • we should enable incrementing / decrementing and setting global memstore and block cache maximum limits. We do not have live configuration changes, but regardless of auto-tuning, we should be able to manually set those online.
          • Periodically we should check past workload (like past 10 min), and depending on whether it is write heavy or read heavy (from metrics), adjust the mem limits in small intervals.

          What do you guys think? Still worth pursuing?

          Show
          Enis Soztutar added a comment - I have been thinking about this, and I think we can have a shot at a simple implementation. Let me summarize what I have in mind before starting the implementation: Goals: Provide min - max heap percentages for block cache (memstore kind of has it). I think we should keep max-min sanity bounds, and if they are equal, disable auto-tuning. enable optimizing the available memory for adaptive workloads (mostly writes during the day, a lot of reads once MR job starts, etc). For example, when a large write job is started after ~10 minutes, region servers should tune for write workload. Non-goals: find the optimum mem-utilization algorithm introduce a bunch of other parameters, to get rid of the current ones make it very experimental so that nobody enables it in production. Ideally, to optimize the usage of the available memory, we should predict the future workload (possibly from past workload), and devise a model capturing all the costs associated with block cache hits / misses, flushes, compactions, etc. But this model will be very complex to do it properly. I have checked Hypertable's implementation, and it seems that they check whether the load is read/write heavy by some hard coded values for the counters, and increment/decrement the mem limits, much like what Zhihong proposes above. I also want to start with something similar. Implementation layer: Currently global memstore limit is a soft limit, we may have to make it a hard limit (blocking writes) we should enable incrementing / decrementing and setting global memstore and block cache maximum limits. We do not have live configuration changes, but regardless of auto-tuning, we should be able to manually set those online. Periodically we should check past workload (like past 10 min), and depending on whether it is write heavy or read heavy (from metrics), adjust the mem limits in small intervals. What do you guys think? Still worth pursuing?
          Hide
          Lars Hofhansl added a comment -

          Moving out of 0.94.

          Show
          Lars Hofhansl added a comment - Moving out of 0.94.
          Hide
          Jean-Daniel Cryans added a comment -

          And by waiting to reach the lower barrier with a constant heavy load, you'll always run into HBASE-5161.

          Show
          Jean-Daniel Cryans added a comment - And by waiting to reach the lower barrier with a constant heavy load, you'll always run into HBASE-5161 .
          Hide
          stack added a comment -

          Chatting w/ J-D about a phenomenon where we do not use memory when we are taking on a bunch of writes w/ a low region count.

          The few regions we have grow to their max of 128M or so and then we flush but in his case he had gigs of free memory still. The notion is that we should let memstores grow to fill all available space and then flush when they hit the low-water global mem mark for the memstore.

          The problem then becomes we'll flush lots of massive files and will overwhelm compactions. We'll need a push-back, something like a flush-merge where we flush by rewriting an existing store file interleaving the contents of memory or some such to slow down the flush but also to make for less compaction to do.

          Show
          stack added a comment - Chatting w/ J-D about a phenomenon where we do not use memory when we are taking on a bunch of writes w/ a low region count. The few regions we have grow to their max of 128M or so and then we flush but in his case he had gigs of free memory still. The notion is that we should let memstores grow to fill all available space and then flush when they hit the low-water global mem mark for the memstore. The problem then becomes we'll flush lots of massive files and will overwhelm compactions. We'll need a push-back, something like a flush-merge where we flush by rewriting an existing store file interleaving the contents of memory or some such to slow down the flush but also to make for less compaction to do.
          Hide
          Jean-Daniel Cryans added a comment -

          @Mubarak

          We already have that through HeapSize, it's really just a matter of knowing what to auto-tune and when.

          Show
          Jean-Daniel Cryans added a comment - @Mubarak We already have that through HeapSize, it's really just a matter of knowing what to auto-tune and when.
          Hide
          Mubarak Seyed added a comment -

          For reference:
          Cassandra 1.0.0 supports self-tunes memtable sizes (CASSANDRA-2787). It uses Java agent (Jamm)

          Show
          Mubarak Seyed added a comment - For reference: Cassandra 1.0.0 supports self-tunes memtable sizes ( CASSANDRA-2787 ). It uses Java agent ( Jamm )
          Hide
          Ted Yu added a comment -

          I am thinking of introducing rolling counters (using circular buffer) for both MemStore and LruBlockCache.
          We record the number of flushes (for MemStore) and evictions (for LruBlockCache), respectively.
          Every 1 (configurable) minute, we roll the counters.
          It should be straight forward to observe whether MemStore or LruBlockCache is under pressure by looking at the trend of the rolling counters.
          Once we determine the one under pressure, we can utilize what J-D described above to shift heap among the two.

          We can also introduce weights between MemStore and LruBlockCache for the rolling counters.

          Show
          Ted Yu added a comment - I am thinking of introducing rolling counters (using circular buffer) for both MemStore and LruBlockCache. We record the number of flushes (for MemStore) and evictions (for LruBlockCache), respectively. Every 1 (configurable) minute, we roll the counters. It should be straight forward to observe whether MemStore or LruBlockCache is under pressure by looking at the trend of the rolling counters. Once we determine the one under pressure, we can utilize what J-D described above to shift heap among the two. We can also introduce weights between MemStore and LruBlockCache for the rolling counters.
          Hide
          Jean-Daniel Cryans added a comment -

          Good question, I don't think looking at requests is good enough... instead we could look at how both are used and if there's adjustment to be made. For example, if you have a read heavy workload then the memstores would not see a lot of usage... same with write heavy, the block cache would be close to empty.

          Those two are clear cuts, now for those workloads in between it gets a bit harder. Maybe at first we shouldn't even try to optimize them.

          I think it should also be done incrementally, move like 3-5% of the heap from one place to the other every few minutes until it settles.

          Show
          Jean-Daniel Cryans added a comment - Good question, I don't think looking at requests is good enough... instead we could look at how both are used and if there's adjustment to be made. For example, if you have a read heavy workload then the memstores would not see a lot of usage... same with write heavy, the block cache would be close to empty. Those two are clear cuts, now for those workloads in between it gets a bit harder. Maybe at first we shouldn't even try to optimize them. I think it should also be done incrementally, move like 3-5% of the heap from one place to the other every few minutes until it settles.
          Hide
          Ted Yu added a comment -

          w.r.t. the measure for determining workload, should the measure be computed solely based on one region server ?
          Or should this measure be relative to the workload on other region servers ?

          Show
          Ted Yu added a comment - w.r.t. the measure for determining workload, should the measure be computed solely based on one region server ? Or should this measure be relative to the workload on other region servers ?
          Hide
          Ted Yu added a comment -

          We currently don't maintain moving average of read/write requests per region server.
          What should be an effective measure for determining read vs. write heavy workload ?

          Show
          Ted Yu added a comment - We currently don't maintain moving average of read/write requests per region server. What should be an effective measure for determining read vs. write heavy workload ?

            People

            • Assignee:
              Anoop Sam John
              Reporter:
              Jean-Daniel Cryans
            • Votes:
              1 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development