Details
Description
We introduce jitter for region split decision in HBASE-13412, but the following line in ConstantSizeRegionSplitPolicy may cause long value overflow if MAX_FILESIZE is specified to Long.MAX_VALUE:
this.desiredMaxFileSize += (long)(desiredMaxFileSize * (RANDOM.nextFloat() - 0.5D) * jitter);
In our case we specify MAX_FILESIZE to Long.MAX_VALUE to prevent target region to split.
Attachments
Attachments
- HBASE-15324_v2.patch
- 4 kB
- Yu Li
- HBASE-15324_v3.patch
- 4 kB
- Michael Stack
- HBASE-15324_v3.patch
- 4 kB
- Yu Li
- HBASE-15324.patch
- 1 kB
- Yu Li
Issue Links
- is related to
-
HBASE-17058 Lower epsilon used for jitter verification from HBASE-15324
- Closed
Activity
Another thing I'd like to mention is that currently there's no attribute in HTableDescriptor to mark a table not splittable, and we have to achieve this by specifying MAX_FILESIZE to Long.MAX_VALUE. I think we could introduce a new attribute like NEVER_SPLIT to make it much easier and straight-forward. Thoughts on this idea? If agreed, I could open another JIRA to implement it. Thanks.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 0s | Docker mode activated. |
+1 | hbaseanti | 0m 0s | Patch does not have any anti-patterns. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 1 new or modified test files. |
+1 | mvninstall | 4m 24s | master passed |
+1 | compile | 0m 53s | master passed with JDK v1.8.0_72 |
+1 | compile | 0m 51s | master passed with JDK v1.7.0_95 |
+1 | checkstyle | 4m 58s | master passed |
+1 | mvneclipse | 0m 30s | master passed |
+1 | findbugs | 3m 17s | master passed |
+1 | javadoc | 1m 10s | master passed with JDK v1.8.0_72 |
+1 | javadoc | 0m 58s | master passed with JDK v1.7.0_95 |
+1 | mvninstall | 0m 53s | the patch passed |
+1 | compile | 0m 49s | the patch passed with JDK v1.8.0_72 |
+1 | javac | 0m 49s | the patch passed |
+1 | compile | 0m 40s | the patch passed with JDK v1.7.0_95 |
+1 | javac | 0m 40s | the patch passed |
-1 | checkstyle | 4m 1s | Patch generated 1 new checkstyle issues in hbase-server (total was 2, now 2). |
+1 | mvneclipse | 0m 16s | the patch passed |
+1 | whitespace | 0m 0s | Patch has no whitespace issues. |
+1 | hadoopcheck | 25m 32s | Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. |
+1 | findbugs | 2m 19s | the patch passed |
+1 | javadoc | 0m 39s | the patch passed with JDK v1.8.0_72 |
+1 | javadoc | 0m 35s | the patch passed with JDK v1.7.0_95 |
-1 | unit | 14m 53s | hbase-server in the patch failed with JDK v1.8.0_72. |
-1 | unit | 15m 54s | hbase-server in the patch failed with JDK v1.7.0_95. |
+1 | asflicense | 0m 10s | Patch does not generate ASF License warnings. |
84m 13s |
Reason | Tests |
---|---|
JDK v1.8.0_72 Failed junit tests | hadoop.hbase.regionserver.TestMetricsRegion |
hadoop.hbase.regionserver.TestMetricsRegionServer | |
hadoop.hbase.ipc.TestRpcMetrics | |
JDK v1.7.0_95 Failed junit tests | hadoop.hbase.regionserver.TestMetricsRegion |
hadoop.hbase.regionserver.TestMetricsRegionServer | |
hadoop.hbase.ipc.TestRpcMetrics |
This message was automatically generated.
Oh yes, DisableRegionSplitPolicy should work for this case, never notice it before... Thanks for the note anoop.hbase
So no need for another JIRA, but the issue mentioned here still needs fix, I think.
new patch trying to fix checkstyle complaint about import order. It seems eclipse's auto import organization could not pass checkstyle, I guess it expects strict dictionary order?
Regarding the UT failures, all irrelative to the patch here. Probably introduced by HBASE-15222 and commens there already trying to fix them.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 0s | Docker mode activated. |
+1 | hbaseanti | 0m 0s | Patch does not have any anti-patterns. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 1 new or modified test files. |
+1 | mvninstall | 2m 57s | master passed |
+1 | compile | 0m 44s | master passed with JDK v1.8.0_72 |
+1 | compile | 0m 40s | master passed with JDK v1.7.0_95 |
+1 | checkstyle | 4m 16s | master passed |
+1 | mvneclipse | 0m 26s | master passed |
+1 | findbugs | 3m 0s | master passed |
+1 | javadoc | 1m 4s | master passed with JDK v1.8.0_72 |
+1 | javadoc | 0m 42s | master passed with JDK v1.7.0_95 |
+1 | mvninstall | 0m 54s | the patch passed |
+1 | compile | 0m 55s | the patch passed with JDK v1.8.0_72 |
+1 | javac | 0m 55s | the patch passed |
+1 | compile | 0m 48s | the patch passed with JDK v1.7.0_95 |
+1 | javac | 0m 48s | the patch passed |
+1 | checkstyle | 5m 7s | the patch passed |
+1 | mvneclipse | 0m 28s | the patch passed |
+1 | whitespace | 0m 0s | Patch has no whitespace issues. |
+1 | hadoopcheck | 25m 37s | Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. |
+1 | findbugs | 2m 6s | the patch passed |
+1 | javadoc | 0m 35s | the patch passed with JDK v1.8.0_72 |
+1 | javadoc | 0m 36s | the patch passed with JDK v1.7.0_95 |
-1 | unit | 15m 37s | hbase-server in the patch failed with JDK v1.8.0_72. |
-1 | unit | 16m 25s | hbase-server in the patch failed with JDK v1.7.0_95. |
+1 | asflicense | 0m 10s | Patch does not generate ASF License warnings. |
83m 37s |
Reason | Tests |
---|---|
JDK v1.8.0_72 Failed junit tests | hadoop.hbase.regionserver.TestMetricsRegion |
hadoop.hbase.regionserver.TestMetricsRegionServer | |
hadoop.hbase.ipc.TestRpcMetrics | |
JDK v1.7.0_95 Failed junit tests | hadoop.hbase.regionserver.TestMetricsRegion |
hadoop.hbase.regionserver.TestMetricsRegionServer | |
hadoop.hbase.ipc.TestRpcMetrics |
This message was automatically generated.
The latest HadoopQA report looks good.
eclark, mind take a look here since this relates to HBASE-13412? Thanks.
Back on this... will wait for one more day and get this in if no objections.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
+1 | hbaseanti | 0m 0s | Patch does not have any anti-patterns. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 1 new or modified test files. |
+1 | mvninstall | 5m 26s | master passed |
+1 | compile | 1m 14s | master passed with JDK v1.8.0 |
+1 | compile | 0m 55s | master passed with JDK v1.7.0_79 |
+1 | checkstyle | 6m 50s | master passed |
+1 | mvneclipse | 0m 24s | master passed |
+1 | findbugs | 3m 3s | master passed |
+1 | javadoc | 0m 53s | master passed with JDK v1.8.0 |
+1 | javadoc | 0m 58s | master passed with JDK v1.7.0_79 |
+1 | mvninstall | 1m 26s | the patch passed |
+1 | compile | 1m 36s | the patch passed with JDK v1.8.0 |
+1 | javac | 1m 36s | the patch passed |
+1 | compile | 1m 14s | the patch passed with JDK v1.7.0_79 |
+1 | javac | 1m 14s | the patch passed |
+1 | checkstyle | 6m 52s | hbase-server: patch generated 0 new + 0 unchanged - 2 fixed = 0 total (was 2) |
+1 | mvneclipse | 0m 22s | the patch passed |
+1 | whitespace | 0m 0s | Patch has no whitespace issues. |
+1 | hadoopcheck | 38m 35s | Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. |
+1 | findbugs | 2m 41s | the patch passed |
+1 | javadoc | 0m 40s | the patch passed with JDK v1.8.0 |
+1 | javadoc | 0m 42s | the patch passed with JDK v1.7.0_79 |
-1 | unit | 158m 49s | hbase-server in the patch failed. |
+1 | asflicense | 0m 18s | Patch does not generate ASF License warnings. |
233m 35s |
Reason | Tests |
---|---|
Timed out junit tests | org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures |
org.apache.hadoop.hbase.regionserver.TestRegionMergeTransactionOnCluster | |
org.apache.hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient |
Subsystem | Report/Notes |
---|---|
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12796089/HBASE-15324_v3.patch |
JIRA Issue | |
Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile |
uname | Linux asf910.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh |
git revision | master / f1fc520 |
Default Java | 1.7.0_79 |
Multi-JDK versions | /home/jenkins/tools/java/jdk1.8.0:1.8.0 /usr/local/jenkins/java/jdk1.7.0_79:1.7.0_79 |
findbugs | v3.0.0 |
unit | https://builds.apache.org/job/PreCommit-HBASE-Build/1228/artifact/patchprocess/patch-unit-hbase-server.txt |
unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/1228/artifact/patchprocess/patch-unit-hbase-server.txt |
Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/1228/testReport/ |
modules | C: hbase-server U: hbase-server |
Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/1228/console |
Powered by | Apache Yetus 0.2.0 http://yetus.apache.org |
This message was automatically generated.
Pushed to 1.3+
Thanks for the patch carp84 The timed out tests I've added timeouts to the two that were missing them or were misconfigured. They have failed in past. Will try and work on them in another issue.
SUCCESS: Integrated in HBase-1.3-IT #590 (See https://builds.apache.org/job/HBase-1.3-IT/590/)
HBASE-15324 Jitter may cause desiredMaxFileSize overflow in (stack: rev 407e644607eba96132d9fa27857000ee9cb1dc20)
- hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
- hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
FAILURE: Integrated in HBase-Trunk_matrix #816 (See https://builds.apache.org/job/HBase-Trunk_matrix/816/)
HBASE-15324 Jitter may cause desiredMaxFileSize overflow in (stack: rev 9d56105eece2d34922ae1c230308193cd0e9b29f)
- hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
- hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
SUCCESS: Integrated in HBase-1.4 #63 (See https://builds.apache.org/job/HBase-1.4/63/)
HBASE-15324 Jitter may cause desiredMaxFileSize overflow in (stack: rev 407e644607eba96132d9fa27857000ee9cb1dc20)
- hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
- hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
FAILURE: Integrated in HBase-1.3 #629 (See https://builds.apache.org/job/HBase-1.3/629/)
HBASE-15324 Jitter may cause desiredMaxFileSize overflow in (stack: rev f8d41f9a2f0e794e2de45debf97b8fa14a8c5d8e)
- hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
- hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
We ran into this in 1.2 with a customer and that caused 10s of thousands of new regions to be created in matter of hours. I'm going to push it from 0.98 to 1.2
Not pushing to 0.98 since the jitter added by HBASE-13412 is not enabled by default.
SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #67 (See https://builds.apache.org/job/HBase-1.2-JDK7/67/)
HBASE-15324 Jitter may cause desiredMaxFileSize overflow in (esteban: rev 6b8472038dd2632d20fdf8d3216c2fe86ee68580)
- (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #61 (See https://builds.apache.org/job/HBase-1.2-JDK8/61/)
HBASE-15324 Jitter may cause desiredMaxFileSize overflow in (esteban: rev 6b8472038dd2632d20fdf8d3216c2fe86ee68580)
- (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1899 (See https://builds.apache.org/job/HBase-1.1-JDK8/1899/)
HBASE-15324 Jitter may cause desiredMaxFileSize overflow in (esteban: rev 9fb5a8608dff761160e217dc3dd5b2dfa9b4875d)
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
- (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
SUCCESS: Integrated in Jenkins build HBase-1.1-JDK7 #1815 (See https://builds.apache.org/job/HBase-1.1-JDK7/1815/)
HBASE-15324 Jitter may cause desiredMaxFileSize overflow in (esteban: rev 9fb5a8608dff761160e217dc3dd5b2dfa9b4875d)
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
- (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
Hi liyu, have one question about the fix. Why the overflow is only checked when (this.jitterRate > EPSILON). I modified a bit of the test case, it can get a minus desiredMaxFileSize (overflow). Should the overflow be checked for all the cases? Thanks.
@Test public void testConstantSizePolicyWithJitter() throws IOException { conf.set(HConstants.HBASE_REGION_SPLIT_POLICY_KEY, ConstantSizeRegionSplitPolicy.class.getName()); htd.setMaxFileSize(Long.MAX_VALUE); boolean positiveJitter = false; ConstantSizeRegionSplitPolicy policy = null; long value = 0; while (value >= 0) { policy = (ConstantSizeRegionSplitPolicy) RegionSplitPolicy.create(mockRegion, conf); //positiveJitter = policy.positiveJitterRate(); value = policy.getDesiredMaxFileSize(); if (value < 0) { System.out.println(policy.getDesiredMaxFileSize()); System.out.println(policy.getJitterRate()); } } // add a store HStore mockStore = Mockito.mock(HStore.class); Mockito.doReturn(2000L).when(mockStore).getSize(); Mockito.doReturn(true).when(mockStore).canSplit(); stores.add(mockStore); // Jitter shouldn't cause overflow when HTableDescriptor.MAX_FILESIZE set to Long.MAX_VALUE assertFalse(policy.shouldSplit()); }
For the failed case, values are:
-9223365302346055681 (getDesiredMaxFileSize)
7.301568984985352E-7 (jitter rate)
huaxiang I think the problem is the value of the epsilon used for the precision of the types involved (float x double). I think it should be at least 2.22e-16 (2^-52) or even 1.11e-16 (2^-53). Created HBASE-17058 for follow up. Thanks.
Yep, both question and answer here are reasonable, and maybe we could simply use jitterRate > 0 to leave the check to JDK. Below is a simple test to confirm JDK could make a good check:
double x = 1e-200; double y = -1e-200; System.out.println(x>0 && y<0);
Thanks for committing this to branch-1.1/1.2 and opening the new issue esteban.
huaxiang feel free to take the new JIRA if you'd like to, or I could take that if you prefer me to, just let me know (Smile).
SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1904 (See https://builds.apache.org/job/HBase-1.1-JDK8/1904/)
HBASE-17058 Lower epsilon used for jitter verification from HBASE-15324 (esteban: rev 7c58547a37c85a148f481398819badd7c26129bc)
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #69 (See https://builds.apache.org/job/HBase-1.3-JDK7/69/)
HBASE-17058 Lower epsilon used for jitter verification from HBASE-15324 (esteban: rev f4ed43e06108687488ebb161086b00274d172bc0)
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #1971 (See https://builds.apache.org/job/HBase-Trunk_matrix/1971/)
HBASE-17058 Lower epsilon used for jitter verification from HBASE-15324 (esteban: rev 7c6e839f6a98cf2c3ed37109318632db13b4a0df)
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
SUCCESS: Integrated in Jenkins build HBase-1.1-JDK7 #1820 (See https://builds.apache.org/job/HBase-1.1-JDK7/1820/)
HBASE-17058 Lower epsilon used for jitter verification from HBASE-15324 (esteban: rev 7c58547a37c85a148f481398819badd7c26129bc)
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #66 (See https://builds.apache.org/job/HBase-1.2-JDK8/66/)
HBASE-17058 Lower epsilon used for jitter verification from HBASE-15324 (esteban: rev bb891c6834a0691302b958d78e0c009b3601c442)
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #80 (See https://builds.apache.org/job/HBase-1.3-JDK8/80/)
HBASE-17058 Lower epsilon used for jitter verification from HBASE-15324 (esteban: rev f4ed43e06108687488ebb161086b00274d172bc0)
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
SUCCESS: Integrated in Jenkins build HBase-1.4 #538 (See https://builds.apache.org/job/HBase-1.4/538/)
HBASE-17058 Lower epsilon used for jitter verification from HBASE-15324 (esteban: rev 19441937ea688b6798675993c6af4a961f931c3a)
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #72 (See https://builds.apache.org/job/HBase-1.2-JDK7/72/)
HBASE-17058 Lower epsilon used for jitter verification from HBASE-15324 (esteban: rev bb891c6834a0691302b958d78e0c009b3601c442)
- (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
A straight forward patch to fix the issue.
To supplement, the region will split with small store size when overflow occurs and cause really bad perf issue.