Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-11957

Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.94.24, 2.0.0
    • None
    • None
    • Reviewed

    Description

      HBASE-5974:Scanner retry behavior with RPC timeout on next() seems incorrect, which cause data missing in hbase scan.

      I think we should fix it in 0.94.
      lhofhansl

      Attachments

        1. verify-test.patch
          5 kB
          Shaohui Liu
        2. HBASE-5974-0.94-v1.diff
          20 kB
          Shaohui Liu
        3. 11957-addendum-2.txt
          2 kB
          Lars Hofhansl
        4. 11957-addendum.txt
          0.8 kB
          Lars Hofhansl

        Activity

          liushaohui Shaohui Liu added a comment -

          Rebase the patch v3 in HBASE-5974 for 0.94

          liushaohui Shaohui Liu added a comment - Rebase the patch v3 in HBASE-5974 for 0.94
          anoop.hbase Anoop Sam John added a comment -

          Haven't gone through the back porting patch. One quick question. Do we make sure client to server compatibility? ie. older version client can talk with new server(0.94.24 with this fix) and new client to old server

          anoop.hbase Anoop Sam John added a comment - Haven't gone through the back porting patch. One quick question. Do we make sure client to server compatibility? ie. older version client can talk with new server(0.94.24 with this fix) and new client to old server
          liushaohui Shaohui Liu added a comment -

          anoop.hbase

          Haven't gone through the back porting patch. One quick question. Do we make sure client to server compatibility? ie. older version client can talk with new server(0.94.24 with this fix) and new client to old server
          

          Yes. Older version client can talk with new server with old "next" api.
          New client try to use the new "next" api first. If there is no such methond, it will switch to use old api.
          See the code in ScannerCallable#call

          liushaohui Shaohui Liu added a comment - anoop.hbase Haven't gone through the back porting patch. One quick question. Do we make sure client to server compatibility? ie. older version client can talk with new server(0.94.24 with this fix) and new client to old server Yes. Older version client can talk with new server with old "next" api. New client try to use the new "next" api first. If there is no such methond, it will switch to use old api. See the code in ScannerCallable#call
          larsh Lars Hofhansl added a comment -

          liushaohui Are you running into this? This fix itself is fine, but we've lived with this until now.
          As this is changing scanning behavior I'd like to be careful.
          On the other hand, this is in 0.98 and hence has seen some testing.

          larsh Lars Hofhansl added a comment - liushaohui Are you running into this? This fix itself is fine, but we've lived with this until now. As this is changing scanning behavior I'd like to be careful. On the other hand, this is in 0.98 and hence has seen some testing.
          liushaohui Shaohui Liu added a comment -

          lhofhansl

          Are you running into this?

          Yes, we encountered data loss in hbase scan because of client retry.
          I wrote a test to produce this problem.

          As this is changing scanning behavior I'd like to be careful.

          I think this doesn't change the scan behavior.
          It just make sure data will not be lost in scan if there are client tries in client.

          liushaohui Shaohui Liu added a comment - lhofhansl Are you running into this? Yes, we encountered data loss in hbase scan because of client retry. I wrote a test to produce this problem. As this is changing scanning behavior I'd like to be careful. I think this doesn't change the scan behavior. It just make sure data will not be lost in scan if there are client tries in client.
          liushaohui Shaohui Liu added a comment -

          Test to reproduce data in hbase scan.
          run it using: mvn clean test -Dtest=TestClientScan -PrunMediumTests

          liushaohui Shaohui Liu added a comment - Test to reproduce data in hbase scan. run it using: mvn clean test -Dtest=TestClientScan -PrunMediumTests
          anoop.hbase Anoop Sam John added a comment -

          Patch looks good.

          anoop.hbase Anoop Sam John added a comment - Patch looks good.
          larsh Lars Hofhansl added a comment -

          apurtell, stack, any opinions? Looks good to me. Would need to be sure that we maintain binary compatibility for coprocessors.

          larsh Lars Hofhansl added a comment - apurtell , stack , any opinions? Looks good to me. Would need to be sure that we maintain binary compatibility for coprocessors.
          stack Michael Stack added a comment -

          My opinion is its an important fix. How you think it could break CP API? I don't see it.

          stack Michael Stack added a comment - My opinion is its an important fix. How you think it could break CP API? I don't see it.
          larsh Lars Hofhansl added a comment -

          Thought maybe this:

          +  final Map<String, RegionScannerHolder> scanners =
          +    new ConcurrentHashMap<String, RegionScannerHolder>();
          

          But it's not public and a reference to it does not leak into the APIs.

          Allright then. Going to commit. Thanks liushaohui.

          larsh Lars Hofhansl added a comment - Thought maybe this: + final Map< String , RegionScannerHolder> scanners = + new ConcurrentHashMap< String , RegionScannerHolder>(); But it's not public and a reference to it does not leak into the APIs. Allright then. Going to commit. Thanks liushaohui .
          larsh Lars Hofhansl added a comment -

          Committed.

          larsh Lars Hofhansl added a comment - Committed.
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-JDK7 #183 (See https://builds.apache.org/job/HBase-0.94-JDK7/183/)
          HBASE-11957 Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect. (Liu Shaohui original patch by Anoop Sam John) (larsh: rev 8f9faabf579c02476acb791c145f34baf49ac8f5)

          • src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java
          • src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          • src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
          • src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
          • src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
          • src/main/java/org/apache/hadoop/hbase/CallSequenceOutOfOrderException.java
          • src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #183 (See https://builds.apache.org/job/HBase-0.94-JDK7/183/ ) HBASE-11957 Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect. (Liu Shaohui original patch by Anoop Sam John) (larsh: rev 8f9faabf579c02476acb791c145f34baf49ac8f5) src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java src/main/java/org/apache/hadoop/hbase/CallSequenceOutOfOrderException.java src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94 #1413 (See https://builds.apache.org/job/HBase-0.94/1413/)
          HBASE-11957 Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect. (Liu Shaohui original patch by Anoop Sam John) (larsh: rev 8f9faabf579c02476acb791c145f34baf49ac8f5)

          • src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java
          • src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java
          • src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
          • src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          • src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
          • src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
          • src/main/java/org/apache/hadoop/hbase/CallSequenceOutOfOrderException.java
          • src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94 #1413 (See https://builds.apache.org/job/HBase-0.94/1413/ ) HBASE-11957 Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect. (Liu Shaohui original patch by Anoop Sam John) (larsh: rev 8f9faabf579c02476acb791c145f34baf49ac8f5) src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java src/main/java/org/apache/hadoop/hbase/CallSequenceOutOfOrderException.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-security #524 (See https://builds.apache.org/job/HBase-0.94-security/524/)
          HBASE-11957 Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect. (Liu Shaohui original patch by Anoop Sam John) (larsh: rev 8f9faabf579c02476acb791c145f34baf49ac8f5)

          • src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java
          • src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          • src/main/java/org/apache/hadoop/hbase/CallSequenceOutOfOrderException.java
          • src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
          • src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java
          • src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
          • src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94-security #524 (See https://builds.apache.org/job/HBase-0.94-security/524/ ) HBASE-11957 Backport to 0.94 HBASE-5974 Scanner retry behavior with RPC timeout on next() seems incorrect. (Liu Shaohui original patch by Anoop Sam John) (larsh: rev 8f9faabf579c02476acb791c145f34baf49ac8f5) src/test/java/org/apache/hadoop/hbase/client/TestClientScannerRPCTimeout.java src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java src/main/java/org/apache/hadoop/hbase/CallSequenceOutOfOrderException.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
          larsh Lars Hofhansl added a comment -

          Looks like this breaks: TestMetaReaderEditorNoCluster.testRideOverServerNotRunning

          larsh Lars Hofhansl added a comment - Looks like this breaks: TestMetaReaderEditorNoCluster.testRideOverServerNotRunning
          larsh Lars Hofhansl added a comment -

          liushaohui, we need to either fix or remove this (at least for now)

          larsh Lars Hofhansl added a comment - liushaohui , we need to either fix or remove this (at least for now)
          larsh Lars Hofhansl added a comment -

          Pushed the attached addendum, which fixes the test.
          My fault that I did the tests slide for so long.

          larsh Lars Hofhansl added a comment - Pushed the attached addendum, which fixes the test. My fault that I did the tests slide for so long.
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94 #1420 (See https://builds.apache.org/job/HBase-0.94/1420/)
          HBASE-11957 Addendum; fix TestMetaReaderEditorNoCluster (larsh: rev 66cfcbe1532261f42524e8e02e762007ef0796a3)

          • src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94 #1420 (See https://builds.apache.org/job/HBase-0.94/1420/ ) HBASE-11957 Addendum; fix TestMetaReaderEditorNoCluster (larsh: rev 66cfcbe1532261f42524e8e02e762007ef0796a3) src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-security #531 (See https://builds.apache.org/job/HBase-0.94-security/531/)
          HBASE-11957 Addendum; fix TestMetaReaderEditorNoCluster (larsh: rev 66cfcbe1532261f42524e8e02e762007ef0796a3)

          • src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94-security #531 (See https://builds.apache.org/job/HBase-0.94-security/531/ ) HBASE-11957 Addendum; fix TestMetaReaderEditorNoCluster (larsh: rev 66cfcbe1532261f42524e8e02e762007ef0796a3) src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-JDK7 #190 (See https://builds.apache.org/job/HBase-0.94-JDK7/190/)
          HBASE-11957 Addendum; fix TestMetaReaderEditorNoCluster (larsh: rev 66cfcbe1532261f42524e8e02e762007ef0796a3)

          • src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #190 (See https://builds.apache.org/job/HBase-0.94-JDK7/190/ ) HBASE-11957 Addendum; fix TestMetaReaderEditorNoCluster (larsh: rev 66cfcbe1532261f42524e8e02e762007ef0796a3) src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
          larsh Lars Hofhansl added a comment -

          2nd addendum! Also needs a fix for TestAssignmentManager, which times out with this patch.

          larsh Lars Hofhansl added a comment - 2nd addendum! Also needs a fix for TestAssignmentManager, which times out with this patch.
          larsh Lars Hofhansl added a comment -

          Pushed. Not happy.

          larsh Lars Hofhansl added a comment - Pushed. Not happy.
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-security #534 (See https://builds.apache.org/job/HBase-0.94-security/534/)
          HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev 9f5c397e27c79124366041b3a93b49aa85abb2be)

          • src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94-security #534 (See https://builds.apache.org/job/HBase-0.94-security/534/ ) HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev 9f5c397e27c79124366041b3a93b49aa85abb2be) src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-0.94-JDK7 #192 (See https://builds.apache.org/job/HBase-0.94-JDK7/192/)
          HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev 9f5c397e27c79124366041b3a93b49aa85abb2be)

          • src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-0.94-JDK7 #192 (See https://builds.apache.org/job/HBase-0.94-JDK7/192/ ) HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev 9f5c397e27c79124366041b3a93b49aa85abb2be) src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-0.94 #1422 (See https://builds.apache.org/job/HBase-0.94/1422/)
          HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev 9f5c397e27c79124366041b3a93b49aa85abb2be)

          • src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-0.94 #1422 (See https://builds.apache.org/job/HBase-0.94/1422/ ) HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev 9f5c397e27c79124366041b3a93b49aa85abb2be) src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-security #535 (See https://builds.apache.org/job/HBase-0.94-security/535/)
          HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev baccf6c9d434132cc027fc9ed28d06aefc25db77)

          • src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94-security #535 (See https://builds.apache.org/job/HBase-0.94-security/535/ ) HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev baccf6c9d434132cc027fc9ed28d06aefc25db77) src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #5568 (See https://builds.apache.org/job/HBase-TRUNK/5568/)
          HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev dc5295df8c5288d29737cfe4d936a817c7a56e72)

          • hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5568 (See https://builds.apache.org/job/HBase-TRUNK/5568/ ) HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev dc5295df8c5288d29737cfe4d936a817c7a56e72) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94 #1423 (See https://builds.apache.org/job/HBase-0.94/1423/)
          HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev baccf6c9d434132cc027fc9ed28d06aefc25db77)

          • src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94 #1423 (See https://builds.apache.org/job/HBase-0.94/1423/ ) HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev baccf6c9d434132cc027fc9ed28d06aefc25db77) src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-0.94-JDK7 #193 (See https://builds.apache.org/job/HBase-0.94-JDK7/193/)
          HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev baccf6c9d434132cc027fc9ed28d06aefc25db77)

          • src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-0.94-JDK7 #193 (See https://builds.apache.org/job/HBase-0.94-JDK7/193/ ) HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev baccf6c9d434132cc027fc9ed28d06aefc25db77) src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-1.0 #237 (See https://builds.apache.org/job/HBase-1.0/237/)
          HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev ae65975426bbee43a35da8d6800ccc2c85bfe2ad)

          • hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-1.0 #237 (See https://builds.apache.org/job/HBase-1.0/237/ ) HBASE-11957 addendum 2; fix TestAssignmentManager (larsh: rev ae65975426bbee43a35da8d6800ccc2c85bfe2ad) hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          liushaohui Shaohui Liu added a comment -

          lhofhansl
          Sorry for troubling you.
          I will test the patch more carefully before submitting it.

          liushaohui Shaohui Liu added a comment - lhofhansl Sorry for troubling you. I will test the patch more carefully before submitting it.
          larsh Lars Hofhansl added a comment -

          liushaohui, no problem. You just backported the patch. I wasn't happy because I committed it and did not watch the tests for close to a week
          Still no successful secure build, but at least we got a full JDK6 and JDK7 build. Not sure the secure build is related to this.

          larsh Lars Hofhansl added a comment - liushaohui , no problem. You just backported the patch. I wasn't happy because I committed it and did not watch the tests for close to a week Still no successful secure build, but at least we got a full JDK6 and JDK7 build. Not sure the secure build is related to this.

          People

            liushaohui Shaohui Liu
            liushaohui Shaohui Liu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: