HBase
  1. HBase
  2. HBASE-4938

Create a HRegion.getScanner public method that allows reading from a specified readPoint

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      There is an existing api HRegion.getScanner(Scan) that allows scanning a table. My proposal is to extend it to HRegion.getScanner(Scan, long readPoint)

      1. scannerMVCC1.txt
        5 kB
        Lars Hofhansl
      2. scannerMVCC1.txt
        5 kB
        dhruba borthakur
      3. scannerMVCC1.txt
        5 kB
        dhruba borthakur

        Activity

        Hide
        dhruba borthakur added a comment -

        This is a very minor restarting of the code. We have some internal HRegion API that needs to scan based on a external readPoint. Of course, we could duplicate this piece of code from HRegion.java into our own code base, but in the spirit of making all our code changes compatible with Apache copen source, I would like to request the community to analyze this requested code change and approve of it. I will provide a patch.

        Show
        dhruba borthakur added a comment - This is a very minor restarting of the code. We have some internal HRegion API that needs to scan based on a external readPoint. Of course, we could duplicate this piece of code from HRegion.java into our own code base, but in the spirit of making all our code changes compatible with Apache copen source, I would like to request the community to analyze this requested code change and approve of it. I will provide a patch.
        Hide
        dhruba borthakur added a comment -
        Show
        dhruba borthakur added a comment - Code changes here: https://reviews.facebook.net/D591
        Hide
        dhruba borthakur added a comment -

        Todd>> Instead, could this be an attribute of the scan, so we don't have to duplicate the number of method entry points?

        I agree, that sounds like an elegant approach. Does anybody else in the community disagree with Todd's suggestion?

        Show
        dhruba borthakur added a comment - Todd>> Instead, could this be an attribute of the scan, so we don't have to duplicate the number of method entry points? I agree, that sounds like an elegant approach. Does anybody else in the community disagree with Todd's suggestion?
        Hide
        Andrew Purtell added a comment -

        +1

        Show
        Andrew Purtell added a comment - +1
        Hide
        Lars Hofhansl added a comment -

        +1
        Let's not add another member to Scan, though, but use our new "rule" that Scan (and Get/Put) attributes beginning with _ are internal.
        (I did this before to encode the cluster id for Puts in master-master replication scenarios, and for "raw" Scans that return delete markers and deleted rows).

        Show
        Lars Hofhansl added a comment - +1 Let's not add another member to Scan, though, but use our new "rule" that Scan (and Get/Put) attributes beginning with _ are internal. (I did this before to encode the cluster id for Puts in master-master replication scenarios, and for "raw" Scans that return delete markers and deleted rows).
        Hide
        dhruba borthakur added a comment -

        Lars I will implement ur proposal, but I wish there was a more efficient way to have friend-relationships in Java, in which case we could have avoided inserting/looking up a map (for attribute) for each scan.

        Show
        dhruba borthakur added a comment - Lars I will implement ur proposal, but I wish there was a more efficient way to have friend-relationships in Java, in which case we could have avoided inserting/looking up a map (for attribute) for each scan.
        Hide
        dhruba borthakur added a comment -

        Lars: I implemented your proposal : https://reviews.facebook.net/D591
        Please let me now if this looks ok to you.

        Show
        dhruba borthakur added a comment - Lars: I implemented your proposal : https://reviews.facebook.net/D591 Please let me now if this looks ok to you.
        Hide
        Todd Lipcon added a comment -

        Dhruba, can you clarify a little more what the purpose of this change is? I didn't quite understand what you meant by "We have some internal HRegion API that needs to scan based on a external readPoint". You have some other non-HBase software which is using HBase's "storage engine" components?

        Show
        Todd Lipcon added a comment - Dhruba, can you clarify a little more what the purpose of this change is? I didn't quite understand what you meant by "We have some internal HRegion API that needs to scan based on a external readPoint". You have some other non-HBase software which is using HBase's "storage engine" components?
        Hide
        dhruba borthakur added a comment -

        Todd, thanks for asking. We have some application code running inside the regionserver. It atomically updates a row and maintains a count of the number of entries inside that row. The reason it is part of the region server code is because we need the 'count' updation to be atomic and performant read-modify-write. The code to read the previous value of the 'count' will invoke getScanner(scan, Long.Max_VALUE) so that it can read the most recent value of 'count' (even if the previous 'count' update is still in the process of getting committed).

        Show
        dhruba borthakur added a comment - Todd, thanks for asking. We have some application code running inside the regionserver. It atomically updates a row and maintains a count of the number of entries inside that row. The reason it is part of the region server code is because we need the 'count' updation to be atomic and performant read-modify-write. The code to read the previous value of the 'count' will invoke getScanner(scan, Long.Max_VALUE) so that it can read the most recent value of 'count' (even if the previous 'count' update is still in the process of getting committed).
        Hide
        dhruba borthakur added a comment -

        Todd: I uploaded a new version of the patch at https://reviews.facebook.net/D591

        Create two isoloationlevels that can be used as part of a scan. If READ_UNCOMMITTED,
        then the scan can read even uncommitted transactions.

        Show
        dhruba borthakur added a comment - Todd: I uploaded a new version of the patch at https://reviews.facebook.net/D591 Create two isoloationlevels that can be used as part of a scan. If READ_UNCOMMITTED, then the scan can read even uncommitted transactions.
        Hide
        dhruba borthakur added a comment -

        Hi Todd, can you pl take a peek at the latest patch that I uploaded?

        Show
        dhruba borthakur added a comment - Hi Todd, can you pl take a peek at the latest patch that I uploaded?
        Hide
        dhruba borthakur added a comment -

        I have run all the unit tests for this one.

        Show
        dhruba borthakur added a comment - I have run all the unit tests for this one.
        Hide
        dhruba borthakur added a comment -

        Attached patch from review.

        Show
        dhruba borthakur added a comment - Attached patch from review.
        Hide
        Ted Yu added a comment -

        Patch testing

        Show
        Ted Yu added a comment - Patch testing
        Hide
        Ted Yu added a comment -

        PreCommit build isn't running.
        New patch needs to be attached.

        Show
        Ted Yu added a comment - PreCommit build isn't running. New patch needs to be attached.
        Hide
        dhruba borthakur added a comment -

        Submitting patch again, hoping that it will be picked up by committers and automatic build testing.

        Show
        dhruba borthakur added a comment - Submitting patch again, hoping that it will be picked up by committers and automatic build testing.
        Hide
        Ted Yu added a comment - - edited

        Hadoop Qa checks attachment Id to not rerun same attachment.
        That's why a new file has to be attached.

        Show
        Ted Yu added a comment - - edited Hadoop Qa checks attachment Id to not rerun same attachment. That's why a new file has to be attached.
        Hide
        dhruba borthakur added a comment -

        Attaching the same patch file again

        Show
        dhruba borthakur added a comment - Attaching the same patch file again
        Hide
        Ted Yu added a comment -

        From https://builds.apache.org/job/PreCommit-HBASE-Build/524/console:

        FATAL: Unable to delete script file /tmp/hudson5668605335506439661.sh
        hudson.util.IOException2: remote file operation failed: /tmp/hudson5668605335506439661.sh at hudson.remoting.Channel@b3df5d7:hadoop2
        	at hudson.FilePath.act(FilePath.java:754)
        	at hudson.FilePath.act(FilePath.java:740)
        	at hudson.FilePath.delete(FilePath.java:995)
        	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
        	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
        	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
        	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:693)
        	at hudson.model.Build$RunnerImpl.build(Build.java:178)
        	at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
        	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:459)
        	at hudson.model.Run.run(Run.java:1376)
        	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        	at hudson.model.ResourceController.execute(ResourceController.java:88)
        	at hudson.model.Executor.run(Executor.java:230)
        Caused by: hudson.remoting.ChannelClosedException: channel is already closed
        	at hudson.remoting.Channel.send(Channel.java:499)
        

        Test suite didn't finish.

        Show
        Ted Yu added a comment - From https://builds.apache.org/job/PreCommit-HBASE-Build/524/console: FATAL: Unable to delete script file /tmp/hudson5668605335506439661.sh hudson.util.IOException2: remote file operation failed: /tmp/hudson5668605335506439661.sh at hudson.remoting.Channel@b3df5d7:hadoop2 at hudson.FilePath.act(FilePath.java:754) at hudson.FilePath.act(FilePath.java:740) at hudson.FilePath.delete(FilePath.java:995) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:693) at hudson.model.Build$RunnerImpl.build(Build.java:178) at hudson.model.Build$RunnerImpl.doRun(Build.java:139) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:459) at hudson.model.Run.run(Run.java:1376) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:230) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:499) Test suite didn't finish.
        Hide
        Lars Hofhansl added a comment -

        +1 on latest patch attached (provided it passes tests).
        I think I can use this for HBASE-4583.

        Show
        Lars Hofhansl added a comment - +1 on latest patch attached (provided it passes tests). I think I can use this for HBASE-4583 .
        Hide
        Lars Hofhansl added a comment -

        Attaching same file again to get a test run.

        Show
        Lars Hofhansl added a comment - Attaching same file again to get a test run.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12508563/scannerMVCC1.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -152 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 77 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/591//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/591//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/591//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508563/scannerMVCC1.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -152 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 77 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/591//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/591//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/591//console This message is automatically generated.
        Hide
        Lars Hofhansl added a comment -

        So that looks pretty good. TestTableMapReduce and TestHFileOutputFormat are flaky.

        Show
        Lars Hofhansl added a comment - So that looks pretty good. TestTableMapReduce and TestHFileOutputFormat are flaky.
        Hide
        Ted Yu added a comment -

        Integrated to TRUNK.

        Thanks for the patch Dhruba.

        Thanks for the review Lars and Todd.

        Show
        Ted Yu added a comment - Integrated to TRUNK. Thanks for the patch Dhruba. Thanks for the review Lars and Todd.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2571 (See https://builds.apache.org/job/HBase-TRUNK/2571/)
        HBASE-4938 Create a HRegion.getScanner public method that allows reading from a specified readPoint (Dhruba)

        tedyu :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/IsolationLevel.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2571 (See https://builds.apache.org/job/HBase-TRUNK/2571/ ) HBASE-4938 Create a HRegion.getScanner public method that allows reading from a specified readPoint (Dhruba) tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/IsolationLevel.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #45 (See https://builds.apache.org/job/HBase-TRUNK-security/45/)
        HBASE-4938 Create a HRegion.getScanner public method that allows reading from a specified readPoint (Dhruba)

        tedyu :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/IsolationLevel.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #45 (See https://builds.apache.org/job/HBase-TRUNK-security/45/ ) HBASE-4938 Create a HRegion.getScanner public method that allows reading from a specified readPoint (Dhruba) tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/IsolationLevel.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        Hide
        dhruba borthakur added a comment -

        This patch seems to have been committed to trunk. Can somebody please mark it as Resolved? Or is there some step that needs to be done because of which the JIRA is kept open?

        Show
        dhruba borthakur added a comment - This patch seems to have been committed to trunk. Can somebody please mark it as Resolved? Or is there some step that needs to be done because of which the JIRA is kept open?

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development