HBase
  1. HBase
  2. HBASE-11667

Comment ClientScanner logic for NSREs.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.99.0, 2.0.0, 0.94.23, 0.98.6
    • Component/s: None
    • Labels:
      None

      Description

      We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the ClientScanner encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key).

      While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at ClientScanner's code I found this logic not necessary.
      A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible,

      1. 11667-0.94.txt
        3 kB
        Lars Hofhansl
      2. HBASE-11667-0.98.patch
        4 kB
        Andrew Purtell
      3. 11667-trunk.txt
        5 kB
        Lars Hofhansl
      4. IntegrationTestBigLinkedListWithRegionMovement.patch
        4 kB
        Andrew Purtell
      5. 11667-doc-0.94.txt
        1 kB
        Lars Hofhansl

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          39m 26s 1 Lars Hofhansl 05/Aug/14 01:28
          Patch Available Patch Available Open Open
          6h 18m 1 Andrew Purtell 05/Aug/14 07:46
          Open Open Resolved Resolved
          1d 11h 49m 1 Lars Hofhansl 06/Aug/14 19:36
          Resolved Resolved Closed Closed
          33d 9h 42m 1 Lars Hofhansl 09/Sep/14 05:19
          Lars Hofhansl made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #413 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/413/)
          HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev 273fe4fc81b6017ffae9ea00e8b58d65d576b84e)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #413 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/413/ ) HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev 273fe4fc81b6017ffae9ea00e8b58d65d576b84e) hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98 #439 (See https://builds.apache.org/job/HBase-0.98/439/)
          HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev 273fe4fc81b6017ffae9ea00e8b58d65d576b84e)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98 #439 (See https://builds.apache.org/job/HBase-0.98/439/ ) HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev 273fe4fc81b6017ffae9ea00e8b58d65d576b84e) hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.94-security #509 (See https://builds.apache.org/job/HBase-0.94-security/509/)
          HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev c51e19afec1925de7769c24e0dbd2bfcf90a184e)

          • src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.94-security #509 (See https://builds.apache.org/job/HBase-0.94-security/509/ ) HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev c51e19afec1925de7769c24e0dbd2bfcf90a184e) src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-1.0 #88 (See https://builds.apache.org/job/HBase-1.0/88/)
          HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev fb8cc733edec3f86884cad19351b8e75484be6cf)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-1.0 #88 (See https://builds.apache.org/job/HBase-1.0/88/ ) HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev fb8cc733edec3f86884cad19351b8e75484be6cf) hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.94 #1398 (See https://builds.apache.org/job/HBase-0.94/1398/)
          HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev c51e19afec1925de7769c24e0dbd2bfcf90a184e)

          • src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.94 #1398 (See https://builds.apache.org/job/HBase-0.94/1398/ ) HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev c51e19afec1925de7769c24e0dbd2bfcf90a184e) src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #5375 (See https://builds.apache.org/job/HBase-TRUNK/5375/)
          HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev 25e6baee424602942afbe40b01ea0b4cff1ea9b1)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5375 (See https://builds.apache.org/job/HBase-TRUNK/5375/ ) HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev 25e6baee424602942afbe40b01ea0b4cff1ea9b1) hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-JDK7 #167 (See https://builds.apache.org/job/HBase-0.94-JDK7/167/)
          HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev c51e19afec1925de7769c24e0dbd2bfcf90a184e)

          • src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #167 (See https://builds.apache.org/job/HBase-0.94-JDK7/167/ ) HBASE-11667 Comment ClientScanner logic for NSREs. (larsh: rev c51e19afec1925de7769c24e0dbd2bfcf90a184e) src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
          Lars Hofhansl made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Lars Hofhansl added a comment -

          Committed doc update to all branches.

          Show
          Lars Hofhansl added a comment - Committed doc update to all branches.
          Hide
          Lars Hofhansl added a comment -

          Keeping last thing we delivered – state – would purge all that is grey in this area.

          In a sense we do, and then use only when encountering a NRSE exception. If we just remember what we delivered (and don't require a real key) we'd always have to rescan the entire region upon a NSRE (not just the last rows worth of scanner caching) and then ignore everything we delivered so far. So definitely less efficient, and might lead to other weird issues if scans do not return the exact same row keys upon repeated execution (for example when a filter or region observer messes with things).

          I'll commit the comment update to all branches for now; but I'll keep thinking about this. This is ugly right now, especially in trunk with the region replicas.

          Show
          Lars Hofhansl added a comment - Keeping last thing we delivered – state – would purge all that is grey in this area. In a sense we do, and then use only when encountering a NRSE exception. If we just remember what we delivered (and don't require a real key) we'd always have to rescan the entire region upon a NSRE (not just the last rows worth of scanner caching) and then ignore everything we delivered so far. So definitely less efficient, and might lead to other weird issues if scans do not return the exact same row keys upon repeated execution (for example when a filter or region observer messes with things). I'll commit the comment update to all branches for now; but I'll keep thinking about this. This is ugly right now, especially in trunk with the region replicas.
          Hide
          stack added a comment -

          The comment is great. +1 on that.

          Keeping last thing we delivered – state – would purge all that is grey in this area.

          Show
          stack added a comment - The comment is great. +1 on that. Keeping last thing we delivered – state – would purge all that is grey in this area.
          Hide
          Andrew Purtell added a comment -

          or we have to know up to what point we have delivered and that means keeping track of the last key we have delivered

          I would say this.

          Show
          Andrew Purtell added a comment - or we have to know up to what point we have delivered and that means keeping track of the last key we have delivered I would say this.
          Hide
          Lars Hofhansl added a comment -

          So commit the java doc change for now?

          Show
          Lars Hofhansl added a comment - So commit the java doc change for now?
          Hide
          Lars Hofhansl added a comment -

          It seems we either have to delay up to an entire region worth of data from delivering up the caller, or we have to know up to what point we have delivered and that means keeping track of the last key we have delivered (it does not need to a real key, but it needs to be unique). Maybe I am being trapped in how things are right now...?

          Show
          Lars Hofhansl added a comment - It seems we either have to delay up to an entire region worth of data from delivering up the caller, or we have to know up to what point we have delivered and that means keeping track of the last key we have delivered (it does not need to a real key, but it needs to be unique). Maybe I am being trapped in how things are right now...?
          Hide
          Andrew Purtell added a comment -

          The client cannot track what the server is doing unless the server tells it what it did (i.e. how far it got with its scan). I don't think we can recover if there is no way to know which state to recover to. All the client can know without help from the server (this particular scan) if last startkey of the last region. I tried to only use that information, but it turns out that does not work. Interesting problem

          The problem with your patch was the client ended up handing duplicate rows to the application layer because we removed a kludge. The client is in the position of observing the Results it is passing up. I wonder if it is possible to detect the 'skipFirst' condition and handle it without relying on the current server response, since we see at least in this case a coprocessor can cause the server to lie. (Of course, we can always declare that to be an invalid thing to do.) I am not looking at code when saying this, just thinking out loud.

          Show
          Andrew Purtell added a comment - The client cannot track what the server is doing unless the server tells it what it did (i.e. how far it got with its scan). I don't think we can recover if there is no way to know which state to recover to. All the client can know without help from the server (this particular scan) if last startkey of the last region. I tried to only use that information, but it turns out that does not work. Interesting problem The problem with your patch was the client ended up handing duplicate rows to the application layer because we removed a kludge. The client is in the position of observing the Results it is passing up. I wonder if it is possible to detect the 'skipFirst' condition and handle it without relying on the current server response, since we see at least in this case a coprocessor can cause the server to lie. (Of course, we can always declare that to be an invalid thing to do.) I am not looking at code when saying this, just thinking out loud.
          Hide
          Lars Hofhansl added a comment -

          It's suspicious that the client relies on the server telling it state it could track locally (or am I missing something?).

          The client cannot track what the server is doing unless the server tells it what it did (i.e. how far it got with its scan). I don't think we can recover if there is no way to know which state to recover to.
          All the client can know without help from the server (this particular scan) if last startkey of the last region. I tried to only use that information, but it turns out that does not work. Interesting problem

          Show
          Lars Hofhansl added a comment - It's suspicious that the client relies on the server telling it state it could track locally (or am I missing something?). The client cannot track what the server is doing unless the server tells it what it did (i.e. how far it got with its scan). I don't think we can recover if there is no way to know which state to recover to. All the client can know without help from the server (this particular scan) if last startkey of the last region. I tried to only use that information, but it turns out that does not work. Interesting problem
          Hide
          Andrew Purtell added a comment -

          I suppose every scan could always send the last row seen as extra metadata at the end of the RPC.

          That wouldn't help. It's suspicious that the client relies on the server telling it state it could track locally (or am I missing something?). In the Phoenix case the server lies because a coprocessor changes something, and the client cannot recover, but could/should?

          Show
          Andrew Purtell added a comment - I suppose every scan could always send the last row seen as extra metadata at the end of the RPC. That wouldn't help. It's suspicious that the client relies on the server telling it state it could track locally (or am I missing something?). In the Phoenix case the server lies because a coprocessor changes something, and the client cannot recover, but could/should?
          Hide
          Lars Hofhansl added a comment -

          We should come up with a test that finds out. Might be a bug issue there.

          Not a bug, I think, just unnecessary work. I also do not see any other way of doing this. When a region moves even from a logical viewpoint you need to know where to start off again or you have to start from last known position (which is the start key of the last region).
          I suppose every scan could always send the last row seen as extra metadata at the end of the RPC. The ClientScanner would record that, but not pass it on to the caller. That would still break the Phoenix scenario that started this discussion though, as the region observer does not send a useful row key back at all.

          Show
          Lars Hofhansl added a comment - We should come up with a test that finds out. Might be a bug issue there. Not a bug, I think, just unnecessary work. I also do not see any other way of doing this. When a region moves even from a logical viewpoint you need to know where to start off again or you have to start from last known position (which is the start key of the last region). I suppose every scan could always send the last row seen as extra metadata at the end of the RPC. The ClientScanner would record that, but not pass it on to the caller. That would still break the Phoenix scenario that started this discussion though, as the region observer does not send a useful row key back at all.
          Hide
          Lars Hofhansl added a comment -

          I meant commit Andrew's test...

          Show
          Lars Hofhansl added a comment - I meant commit Andrew's test...
          Hide
          James Taylor added a comment -

          For more cases in which this is problematic for Phoenix and to see the workaround, see PHOENIX-1146.

          Show
          James Taylor added a comment - For more cases in which this is problematic for Phoenix and to see the workaround, see PHOENIX-1146 .
          Hide
          Andrew Purtell added a comment -

          I also wonder now how this would behave with a filter that filters most (or all) KVs for a region in the case when we have a stale cache due to splits... Since the ClientScanner has nothing to go by it would redo the scan of the entire prior region.

          We should come up with a test that finds out. Might be a bug issue there.

          And I think we should comment Andrew Purtell's integration test.

          See HBASE-11672

          Show
          Andrew Purtell added a comment - I also wonder now how this would behave with a filter that filters most (or all) KVs for a region in the case when we have a stale cache due to splits... Since the ClientScanner has nothing to go by it would redo the scan of the entire prior region. We should come up with a test that finds out. Might be a bug issue there. And I think we should comment Andrew Purtell's integration test. See HBASE-11672
          Lars Hofhansl made changes -
          Attachment 11667-doc-0.94.txt [ 12659838 ]
          Lars Hofhansl made changes -
          Attachment 11667-doc-0.94.txt [ 12659835 ]
          Hide
          Lars Hofhansl added a comment -

          And I think we should comment Andrew Purtell's integration test.

          Show
          Lars Hofhansl added a comment - And I think we should comment Andrew Purtell 's integration test.
          Lars Hofhansl made changes -
          Summary Simplify ClientScanner logic for NSREs. Comment ClientScanner logic for NSREs.
          Lars Hofhansl made changes -
          Priority Major [ 3 ] Minor [ 4 ]
          Lars Hofhansl made changes -
          Attachment 11667-doc-0.94.txt [ 12659835 ]
          Hide
          Lars Hofhansl added a comment -

          How this for comment? (0.94)

          Show
          Lars Hofhansl added a comment - How this for comment? (0.94)
          Hide
          Lars Hofhansl added a comment -

          I also wonder now how this would behave with a filter that filters most (or all) KVs for a region in the case when we have a stale cache due to splits... Since the ClientScanner has nothing to go by it would redo the scan of the entire prior region.

          Show
          Lars Hofhansl added a comment - I also wonder now how this would behave with a filter that filters most (or all) KVs for a region in the case when we have a stale cache due to splits... Since the ClientScanner has nothing to go by it would redo the scan of the entire prior region.
          Andrew Purtell made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Andrew Purtell made changes -
          Link This issue is related to PHOENIX-1146 [ PHOENIX-1146 ]
          Hide
          Lars Hofhansl added a comment -

          Yeah, I can add some Javadoc. James has another workaround for Phoenix, where he detects a stale cache and reloads for the situations where that is necessary (as in the count query I mentioned in the description).

          Show
          Lars Hofhansl added a comment - Yeah, I can add some Javadoc. James has another workaround for Phoenix, where he detects a stale cache and reloads for the situations where that is necessary (as in the count query I mentioned in the description).
          Hide
          stack added a comment -

          Can we at least doc why we do this setting of the start row since I'd forgotten and other fellas, smart fellas, couldn't see the why w/o intervention from 12k miles away?

          What about the phoenix issue?

          Show
          stack added a comment - Can we at least doc why we do this setting of the start row since I'd forgotten and other fellas, smart fellas, couldn't see the why w/o intervention from 12k miles away? What about the phoenix issue?
          Hide
          Lars Hofhansl added a comment -

          I think I see what the issue is now. Each RPC needs to set the correct startRow as it might just hit the next region. My patch removes exactly that part. Hence an NSRE in any but the first round would redo scanning from the previous startKey, which then might rescan the last batch (and hence the number of extra rows scanned is always the caching setting).

          Will close as "Invalid" unless somebody has more ideas.

          Show
          Lars Hofhansl added a comment - I think I see what the issue is now. Each RPC needs to set the correct startRow as it might just hit the next region. My patch removes exactly that part. Hence an NSRE in any but the first round would redo scanning from the previous startKey, which then might rescan the last batch (and hence the number of extra rows scanned is always the caching setting). Will close as "Invalid" unless somebody has more ideas.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12659813/IntegrationTestBigLinkedListWithRegionMovement.patch
          against trunk revision .
          ATTACHMENT ID: 12659813

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.procedure.TestProcedureManager
          org.apache.hadoop.hbase.ipc.TestIPC
          org.apache.hadoop.hbase.master.TestClockSkewDetection

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659813/IntegrationTestBigLinkedListWithRegionMovement.patch against trunk revision . ATTACHMENT ID: 12659813 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.procedure.TestProcedureManager org.apache.hadoop.hbase.ipc.TestIPC org.apache.hadoop.hbase.master.TestClockSkewDetection Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//console This message is automatically generated.
          Hide
          Andrew Purtell added a comment -

          FWIW, that integration test passed with this change on 0.98, executing for 20 minutes. So not good enough.

          Show
          Andrew Purtell added a comment - FWIW, that integration test passed with this change on 0.98, executing for 20 minutes. So not good enough.
          Hide
          Lars Hofhansl added a comment -

          Interestingly it is always wrong by exactly one batch. When I set the caching to 100 I sometimes get 7833 instead of the 7733.
          Hmm... So maybe not as a simple as I thought.

          Thanks for catching me chunhui shen!

          Show
          Lars Hofhansl added a comment - Interestingly it is always wrong by exactly one batch. When I set the caching to 100 I sometimes get 7833 instead of the 7733. Hmm... So maybe not as a simple as I thought. Thanks for catching me chunhui shen !
          Hide
          Lars Hofhansl added a comment -

          Ah... Now I got a run with

          Failed tests:   testScansWithSplits(org.apache.hadoop.hbase.client.TestFromClientSide): expected:<7733> but was:<7743>
          

          OK... I buy that there is an issue. Although I do not understand why. An RPC either fails or it doesn't. There is no notion of partial RPC.
          Say the scan RPC with 'aaa' fails with an NSRE. Then no rows of that RPC will be returned and it can be retried. With scanner caching = 1, the next RPC would then try with 'bbb', which would also either succeed or fail. The scanner cannot fail partially during an RPC.

          Show
          Lars Hofhansl added a comment - Ah... Now I got a run with Failed tests: testScansWithSplits(org.apache.hadoop.hbase.client.TestFromClientSide): expected:<7733> but was:<7743> OK... I buy that there is an issue. Although I do not understand why. An RPC either fails or it doesn't. There is no notion of partial RPC. Say the scan RPC with 'aaa' fails with an NSRE. Then no rows of that RPC will be returned and it can be retried. With scanner caching = 1, the next RPC would then try with 'bbb', which would also either succeed or fail. The scanner cannot fail partially during an RPC.
          Andrew Purtell made changes -
          Andrew Purtell made changes -
          Attachment IntegrationTestBigLinkedListWithRegionMovement.patch [ 12659810 ]
          Hide
          Lars Hofhansl added a comment -

          Actually I do get this exception without the patch ClientScanner

          Show
          Lars Hofhansl added a comment - Actually I do get this exception without the patch ClientScanner
          Andrew Purtell made changes -
          Attachment IntegrationTestBigLinkedListWithRegionMovement.patch [ 12659810 ]
          Hide
          Andrew Purtell added a comment - - edited

          Attached as IntegrationTestBigLinkedListWithRegionMovement, an integration test that extends ITBLL with a fixed monkey policy that moves a random region of the table very often.

          Show
          Andrew Purtell added a comment - - edited Attached as IntegrationTestBigLinkedListWithRegionMovement, an integration test that extends ITBLL with a fixed monkey policy that moves a random region of the table very often.
          Hide
          Lars Hofhansl added a comment -

          Hmm... I always get this:

          Tests in error: 
            testScansWithSplits(org.apache.hadoop.hbase.client.TestFromClientSide): org.apache.hadoop.hbase.NotServingRegionException: Region is not online: testScansWithSplits,wWW4,1407209622238.f53974a70f899779b20750fadb5cf4bd.
          

          But I do not get this without the patch to ClientScanner. Hmm...

          Show
          Lars Hofhansl added a comment - Hmm... I always get this: Tests in error: testScansWithSplits(org.apache.hadoop.hbase.client.TestFromClientSide): org.apache.hadoop.hbase.NotServingRegionException: Region is not online: testScansWithSplits,wWW4,1407209622238.f53974a70f899779b20750fadb5cf4bd. But I do not get this without the patch to ClientScanner . Hmm...
          Hide
          Lars Hofhansl added a comment -

          Sure. Split is not synchronous though. Will try.

          Show
          Lars Hofhansl added a comment - Sure. Split is not synchronous though. Will try.
          Hide
          chunhui shen added a comment -

          For test case, I mean move the split action to the while block, like this:

                while (rs.next() != null) {
                  c++;
                  if (c % 4333 == 1) {
                    TEST_UTIL.getHBaseAdmin().split(TABLE);
                  }
                }
                assertEquals(7733, c);
          

          In our internal 0.94 branch, the test will failed with the patch:

          java.lang.AssertionError: expected:<7733> but was:<7743>
          	at org.junit.Assert.fail(Assert.java:93)
          	at org.junit.Assert.failNotEquals(Assert.java:647)
          	at org.junit.Assert.assertEquals(Assert.java:128)
          	at org.junit.Assert.assertEquals(Assert.java:472)
          	at org.junit.Assert.assertEquals(Assert.java:456)
          	at org.apache.hadoop.hbase.client.TestFromClientSide.testScansWithSplits(TestFromClientSide.java:5096)
          

          Lars Hofhansl
          Could you take the above change about test case and try a test?

          Show
          chunhui shen added a comment - For test case, I mean move the split action to the while block, like this: while (rs.next() != null ) { c++; if (c % 4333 == 1) { TEST_UTIL.getHBaseAdmin().split(TABLE); } } assertEquals(7733, c); In our internal 0.94 branch, the test will failed with the patch: java.lang.AssertionError: expected:<7733> but was:<7743> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.client.TestFromClientSide.testScansWithSplits(TestFromClientSide.java:5096) Lars Hofhansl Could you take the above change about test case and try a test?
          Lars Hofhansl made changes -
          Attachment 11667-trunk.txt [ 12659778 ]
          Lars Hofhansl made changes -
          Attachment 11667-trunk.txt [ 12659803 ]
          Hide
          Lars Hofhansl added a comment -

          Reattaching trunk patch to make sure Hadoop QA picks up the last attachment (is that still necessary?)

          Show
          Lars Hofhansl added a comment - Reattaching trunk patch to make sure Hadoop QA picks up the last attachment (is that still necessary?)
          Andrew Purtell made changes -
          Fix Version/s 0.98.6 [ 12327376 ]
          Fix Version/s 0.98.5 [ 12326953 ]
          Hide
          Lars Hofhansl added a comment -

          That's exactly the kind of discussion I wanted to have.

          In your example chunhui shen, each request to a region server involves a key (the Scan's start key). So after your #3 there would be a new RPC. With a new key ('bbb' if scanner caching is 1 or 'ccc' if scanner caching is 2, etc). In each case the scanner would correctly reset itself to retry only the part needed for the last startkey used for the RPC that failed with NSRE.
          As for the test, it running/interleaving scans and splits in a loop (20 iteratoions) and test for the scenario you mention.

          I still think the change is correct.

          Show
          Lars Hofhansl added a comment - That's exactly the kind of discussion I wanted to have. In your example chunhui shen , each request to a region server involves a key (the Scan's start key). So after your #3 there would be a new RPC. With a new key ('bbb' if scanner caching is 1 or 'ccc' if scanner caching is 2, etc). In each case the scanner would correctly reset itself to retry only the part needed for the last startkey used for the RPC that failed with NSRE. As for the test, it running/interleaving scans and splits in a loop (20 iteratoions) and test for the scenario you mention. I still think the change is correct.
          Hide
          chunhui shen added a comment -

          The logic of skipFirst is used to handle the case that client call 'next' request and server returns the NSRE in the process of scanning.
          I think it shouln't be removed directly.

          For example, for a region contains rows: 'aaa','bbb','ccc','ddd'.
          Do the following things:
          1.Client open the scanner(empty start row) of this region.
          2.Client call next, and get row 'aaa'
          3.Move the region to another server
          4.Client call next request to the old server, it will get NSRE, thus client will open the scanner again with 'aaa'(the last result) as the start row.
          5.Client should skip the first row 'aaa'

          So, for the test case testScansWithSplits(), we should do "TEST_UTIL.getHBaseAdmin().split(TABLE)" in the process of scanning rather than after completing scanning.

          Pardon me if something wrong.

          Show
          chunhui shen added a comment - The logic of skipFirst is used to handle the case that client call 'next' request and server returns the NSRE in the process of scanning. I think it shouln't be removed directly. For example, for a region contains rows: 'aaa','bbb','ccc','ddd'. Do the following things: 1.Client open the scanner(empty start row) of this region. 2.Client call next, and get row 'aaa' 3.Move the region to another server 4.Client call next request to the old server, it will get NSRE, thus client will open the scanner again with 'aaa'(the last result) as the start row. 5.Client should skip the first row 'aaa' So, for the test case testScansWithSplits(), we should do "TEST_UTIL.getHBaseAdmin().split(TABLE)" in the process of scanning rather than after completing scanning. Pardon me if something wrong.
          Andrew Purtell made changes -
          Attachment HBASE-11667-0.98.patch [ 12659790 ]
          Hide
          Andrew Purtell added a comment -

          Attaching a patch for 0.98.

          Working on an integration test now.

          Show
          Andrew Purtell added a comment - Attaching a patch for 0.98. Working on an integration test now.
          Hide
          Devaraj Das added a comment -

          Lars Hofhansl, HBASE-11665 is tracking failures in the TestRegionReplicas. For scan tests you should run TestReplicasClient tests - that has tests for scans with replicas. Will have a look at your patch.

          Show
          Devaraj Das added a comment - Lars Hofhansl , HBASE-11665 is tracking failures in the TestRegionReplicas. For scan tests you should run TestReplicasClient tests - that has tests for scans with replicas. Will have a look at your patch.
          Lars Hofhansl made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Lars Hofhansl added a comment -

          Let's try hadoop QA

          Show
          Lars Hofhansl added a comment - Let's try hadoop QA
          Lars Hofhansl made changes -
          Attachment 11667-trunk.txt [ 12659778 ]
          Hide
          Lars Hofhansl added a comment -

          And a trunk version.

          The logic here was more complicated due to region replicas.
          TestRegionReplicas fails locally for me with or without the patch, so not sure.

          Enis Soztutar and whoever knows about region replicas (maybe Jeffrey Zhong?), please have a careful look. The simplification of this code would be nice if it is correct.

          Show
          Lars Hofhansl added a comment - And a trunk version. The logic here was more complicated due to region replicas. TestRegionReplicas fails locally for me with or without the patch, so not sure. Enis Soztutar and whoever knows about region replicas (maybe Jeffrey Zhong ?), please have a careful look. The simplification of this code would be nice if it is correct.
          Andrew Purtell made changes -
          Fix Version/s 0.98.5 [ 12326953 ]
          Fix Version/s 0.98.6 [ 12327376 ]
          Lars Hofhansl made changes -
          Field Original Value New Value
          Attachment 11667-0.94.txt [ 12659765 ]
          Hide
          Lars Hofhansl added a comment -

          Here's a proposal against 0.94.
          Please have a very careful look, as there might be corner conditions lurking that I have not seen.
          I ran TestFromClientSide (including the new test I added) and it worked fine.

          Since we retry with the previous key that caused the failure, all the skipFirst huh-hah just goes away, which is nice.

          Show
          Lars Hofhansl added a comment - Here's a proposal against 0.94. Please have a very careful look, as there might be corner conditions lurking that I have not seen. I ran TestFromClientSide (including the new test I added) and it worked fine. Since we retry with the previous key that caused the failure, all the skipFirst huh-hah just goes away, which is nice.
          Hide
          Lars Hofhansl added a comment -
          Show
          Lars Hofhansl added a comment - James Taylor , Andrew Purtell , FYI.
          Lars Hofhansl created issue -

            People

            • Assignee:
              Lars Hofhansl
              Reporter:
              Lars Hofhansl
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development