Description
HBASE-5974:Scanner retry behavior with RPC timeout on next() seems incorrect, which cause data missing in hbase scan.
I think we should fix it in 0.94.
lhofhansl
Attachments
Attachments
- verify-test.patch
- 5 kB
- Shaohui Liu
- HBASE-5974-0.94-v1.diff
- 20 kB
- Shaohui Liu
- 11957-addendum-2.txt
- 2 kB
- Lars Hofhansl
- 11957-addendum.txt
- 0.8 kB
- Lars Hofhansl
Activity
Haven't gone through the back porting patch. One quick question. Do we make sure client to server compatibility? ie. older version client can talk with new server(0.94.24 with this fix) and new client to old server
Haven't gone through the back porting patch. One quick question. Do we make sure client to server compatibility? ie. older version client can talk with new server(0.94.24 with this fix) and new client to old server
Yes. Older version client can talk with new server with old "next" api.
New client try to use the new "next" api first. If there is no such methond, it will switch to use old api.
See the code in ScannerCallable#call
liushaohui Are you running into this? This fix itself is fine, but we've lived with this until now.
As this is changing scanning behavior I'd like to be careful.
On the other hand, this is in 0.98 and hence has seen some testing.
Are you running into this?
Yes, we encountered data loss in hbase scan because of client retry.
I wrote a test to produce this problem.
As this is changing scanning behavior I'd like to be careful.
I think this doesn't change the scan behavior.
It just make sure data will not be lost in scan if there are client tries in client.
Test to reproduce data in hbase scan.
run it using: mvn clean test -Dtest=TestClientScan -PrunMediumTests
apurtell, stack, any opinions? Looks good to me. Would need to be sure that we maintain binary compatibility for coprocessors.
My opinion is its an important fix. How you think it could break CP API? I don't see it.
Thought maybe this:
+ final Map<String, RegionScannerHolder> scanners = + new ConcurrentHashMap<String, RegionScannerHolder>();
But it's not public and a reference to it does not leak into the APIs.
Allright then. Going to commit. Thanks liushaohui.
FAILURE: Integrated in HBase-0.94-JDK7 #183 (See https://builds.apache.org/job/HBase-0.94-JDK7/183/)
HBASE-11957 Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect. (Liu Shaohui original patch by Anoop Sam John) (larsh: rev 8f9faabf579c02476acb791c145f34baf49ac8f5)
- src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java
- src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
- src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
- src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
- src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
- src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
- src/main/java/org/apache/hadoop/hbase/CallSequenceOutOfOrderException.java
- src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java
FAILURE: Integrated in HBase-0.94 #1413 (See https://builds.apache.org/job/HBase-0.94/1413/)
HBASE-11957 Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect. (Liu Shaohui original patch by Anoop Sam John) (larsh: rev 8f9faabf579c02476acb791c145f34baf49ac8f5)
- src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java
- src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java
- src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
- src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
- src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
- src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
- src/main/java/org/apache/hadoop/hbase/CallSequenceOutOfOrderException.java
- src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
FAILURE: Integrated in HBase-0.94-security #524 (See https://builds.apache.org/job/HBase-0.94-security/524/)
HBASE-11957 Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect. (Liu Shaohui original patch by Anoop Sam John) (larsh: rev 8f9faabf579c02476acb791c145f34baf49ac8f5)
- src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java
- src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
- src/main/java/org/apache/hadoop/hbase/CallSequenceOutOfOrderException.java
- src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
- src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
- src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java
- src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
- src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
Looks like this breaks: TestMetaReaderEditorNoCluster.testRideOverServerNotRunning
Pushed the attached addendum, which fixes the test.
My fault that I did the tests slide for so long.
FAILURE: Integrated in HBase-0.94 #1420 (See https://builds.apache.org/job/HBase-0.94/1420/)
HBASE-11957 Addendum; fix TestMetaReaderEditorNoCluster (larsh: rev 66cfcbe1532261f42524e8e02e762007ef0796a3)
- src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
FAILURE: Integrated in HBase-0.94-security #531 (See https://builds.apache.org/job/HBase-0.94-security/531/)
HBASE-11957 Addendum; fix TestMetaReaderEditorNoCluster (larsh: rev 66cfcbe1532261f42524e8e02e762007ef0796a3)
- src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
FAILURE: Integrated in HBase-0.94-JDK7 #190 (See https://builds.apache.org/job/HBase-0.94-JDK7/190/)
HBASE-11957 Addendum; fix TestMetaReaderEditorNoCluster (larsh: rev 66cfcbe1532261f42524e8e02e762007ef0796a3)
- src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
2nd addendum! Also needs a fix for TestAssignmentManager, which times out with this patch.
FAILURE: Integrated in HBase-0.94-security #534 (See https://builds.apache.org/job/HBase-0.94-security/534/)
HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev 9f5c397e27c79124366041b3a93b49aa85abb2be)
- src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
SUCCESS: Integrated in HBase-0.94-JDK7 #192 (See https://builds.apache.org/job/HBase-0.94-JDK7/192/)
HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev 9f5c397e27c79124366041b3a93b49aa85abb2be)
- src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
SUCCESS: Integrated in HBase-0.94 #1422 (See https://builds.apache.org/job/HBase-0.94/1422/)
HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev 9f5c397e27c79124366041b3a93b49aa85abb2be)
- src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
FAILURE: Integrated in HBase-0.94-security #535 (See https://builds.apache.org/job/HBase-0.94-security/535/)
HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev baccf6c9d434132cc027fc9ed28d06aefc25db77)
- src/main/java/org/apache/hadoop/hbase/util/Bytes.java
FAILURE: Integrated in HBase-TRUNK #5568 (See https://builds.apache.org/job/HBase-TRUNK/5568/)
HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev dc5295df8c5288d29737cfe4d936a817c7a56e72)
- hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
FAILURE: Integrated in HBase-0.94 #1423 (See https://builds.apache.org/job/HBase-0.94/1423/)
HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev baccf6c9d434132cc027fc9ed28d06aefc25db77)
- src/main/java/org/apache/hadoop/hbase/util/Bytes.java
SUCCESS: Integrated in HBase-0.94-JDK7 #193 (See https://builds.apache.org/job/HBase-0.94-JDK7/193/)
HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev baccf6c9d434132cc027fc9ed28d06aefc25db77)
- src/main/java/org/apache/hadoop/hbase/util/Bytes.java
FAILURE: Integrated in HBase-1.0 #237 (See https://builds.apache.org/job/HBase-1.0/237/)
HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev ae65975426bbee43a35da8d6800ccc2c85bfe2ad)
- hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
lhofhansl
Sorry for troubling you.
I will test the patch more carefully before submitting it.
liushaohui, no problem. You just backported the patch. I wasn't happy because I committed it and did not watch the tests for close to a week
Still no successful secure build, but at least we got a full JDK6 and JDK7 build. Not sure the secure build is related to this.
Rebase the patch v3 in
HBASE-5974for 0.94