Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Coprocessors
    • Labels:
      None

      Description

      A scanner that rather than scan in series, instead scanned multiple regions in parallell would be more involved but could complete much faster partiularly if results are sparse.

      1. pscanner-v4.patch
        137 kB
        stack
      2. pscanner-v3.patch
        138 kB
        stack
      3. pscanner-v2.patch
        134 kB
        stack
      4. pscanner.patch
        130 kB
        stack
      5. 1935-idea.txt
        12 kB
        Lars Hofhansl

        Issue Links

          Activity

          Hide
          stack added a comment -

          Here is a first attempt. Maybe someone would like to take it on? It runs scanners against multiple regions concurrently and then aggregates the results. Includes a unit test but needs convertion to new style client-side test (Only two of the tests in the unit test are for parallel scanning).

          Show
          stack added a comment - Here is a first attempt. Maybe someone would like to take it on? It runs scanners against multiple regions concurrently and then aggregates the results. Includes a unit test but needs convertion to new style client-side test (Only two of the tests in the unit test are for parallel scanning).
          Hide
          Jonathan Gray added a comment -

          Went over the patch twice. Looks pretty good.

          There is some cross-over with work done in Multi operations (MultiGet, MultiDelete, etc..). I think the first thing to decide is if we want to create some unified threading system or take passed-in ExecutorServices as is done with the patch. And do we need a special ParallelHTable, or should the normal HTable support threading? I believe the latter.

          At either the HCM or HTable level, I think we should have a local, bounded ExecutorService pool. You would be able to modify its size through the constructor, but default settings would come from something in the conf like hbase.client.threads.

          One thing I do like (at least for early versions of threaded clients) is just failing immediately when encountering a problem like a split. Properly handling this is one of the hardest parts about this (and other things like stateful filters), and retries are tricky and imperfect. With batched/parallel reads (get or scan) we should just fail-fast and throw exceptions to let the client deal. With batched/parallel writes (put or delete) we should process what we can and return back to the client what was not completed.

          Another thing I'm a little confused about... this seems to be designed for completely out-of-order receipt of results. Rather than aggregating up a list of Futures, and then waiting for them to complete in order, this uses a ExecutorCompletionService which returns things as they finish. I can see in certain use cases this would make sense, but is a bit more limited. However, I don't see why we can't support both using two different task completion-waiting paths and with very small changes to the constructor APIs.

          Show
          Jonathan Gray added a comment - Went over the patch twice. Looks pretty good. There is some cross-over with work done in Multi operations (MultiGet, MultiDelete, etc..). I think the first thing to decide is if we want to create some unified threading system or take passed-in ExecutorServices as is done with the patch. And do we need a special ParallelHTable, or should the normal HTable support threading? I believe the latter. At either the HCM or HTable level, I think we should have a local, bounded ExecutorService pool. You would be able to modify its size through the constructor, but default settings would come from something in the conf like hbase.client.threads. One thing I do like (at least for early versions of threaded clients) is just failing immediately when encountering a problem like a split. Properly handling this is one of the hardest parts about this (and other things like stateful filters), and retries are tricky and imperfect. With batched/parallel reads (get or scan) we should just fail-fast and throw exceptions to let the client deal. With batched/parallel writes (put or delete) we should process what we can and return back to the client what was not completed. Another thing I'm a little confused about... this seems to be designed for completely out-of-order receipt of results. Rather than aggregating up a list of Futures, and then waiting for them to complete in order, this uses a ExecutorCompletionService which returns things as they finish. I can see in certain use cases this would make sense, but is a bit more limited. However, I don't see why we can't support both using two different task completion-waiting paths and with very small changes to the constructor APIs.
          Hide
          stack added a comment -

          Would be sweet if we could get parallel into stock HTable. Also if cross-over with multiget, multidelete, lets make single system.

          Agree on how to handle errors during batch puts/gets.

          I like the idea of supporting both in-order and out-of-order.

          Show
          stack added a comment - Would be sweet if we could get parallel into stock HTable. Also if cross-over with multiget, multidelete, lets make single system. Agree on how to handle errors during batch puts/gets. I like the idea of supporting both in-order and out-of-order.
          Hide
          Dan Washusen added a comment -

          re. out-of-order receipt of results

          What do you see as the benefits in parallel scanning with results in order?

          The 'RegionCallable' defined at line 3109 of the patch opens a scanner on a specific region server. The same scanner is then used for all results returned from that region. If you wanted to receive results in-order the time saved would be;

          • The time taken to switch from one region to the next. For example, while iterating over results from region 1 you could start fetching results from region 2.
          • The time spent by the client iterating over the results returned in that batch before asking the server side scanner for the next batch.

          re. startRow and endRow restrictions

          The ParallelHTable in this patch (line 3608) falls back to a sequential scan if the scan has a startRow or endRow defined. It should be possible to use the parallel scanner with out-of-order receipt of results if either of these values are specified. The scanner could list all regions and for each region see if it's startKey and endKey fall within the scan's startRow and endRow. If it does scan it.

          I'm probably stating the obvious with both those points but I'm new to HBase so you'll have to forgive me.

          Cheers,
          Dan

          Show
          Dan Washusen added a comment - re. out-of-order receipt of results What do you see as the benefits in parallel scanning with results in order? The 'RegionCallable' defined at line 3109 of the patch opens a scanner on a specific region server. The same scanner is then used for all results returned from that region. If you wanted to receive results in-order the time saved would be; The time taken to switch from one region to the next. For example, while iterating over results from region 1 you could start fetching results from region 2. The time spent by the client iterating over the results returned in that batch before asking the server side scanner for the next batch. re. startRow and endRow restrictions The ParallelHTable in this patch (line 3608) falls back to a sequential scan if the scan has a startRow or endRow defined. It should be possible to use the parallel scanner with out-of-order receipt of results if either of these values are specified. The scanner could list all regions and for each region see if it's startKey and endKey fall within the scan's startRow and endRow. If it does scan it. I'm probably stating the obvious with both those points but I'm new to HBase so you'll have to forgive me. Cheers, Dan
          Hide
          stack added a comment -

          Here's a v2 that adds the ability to parallel scan when either/both the startRow and stopRow are specified. There are also more tests included...

          Show
          stack added a comment - Here's a v2 that adds the ability to parallel scan when either/both the startRow and stopRow are specified. There are also more tests included...
          Hide
          Dan Washusen added a comment -

          There is a minor bug in v2 of the patch. The logic in the ParallelScannerManager to determine if a scan is interested in a region doesn't handle the case when there is only one region.

          The following fixes it:

          Set<HRegionInfo> regions = table.getRegionsInfo().keySet();
          for (HRegionInfo region : regions) {
            ...
            boolean isScanInterestedInRegion = (scan.getStartRow().length == 0 && scan.getStopRow().length == 0) || regions.size() == 1;
          
          Show
          Dan Washusen added a comment - There is a minor bug in v2 of the patch. The logic in the ParallelScannerManager to determine if a scan is interested in a region doesn't handle the case when there is only one region. The following fixes it: Set<HRegionInfo> regions = table.getRegionsInfo().keySet(); for (HRegionInfo region : regions) { ... boolean isScanInterestedInRegion = (scan.getStartRow().length == 0 && scan.getStopRow().length == 0) || regions.size() == 1;
          Hide
          stack added a comment -

          v3 of parallel scanner. Includes Dan's fix suggested above. Also It fixes issues with the logic that determines if a scan is interested in a region.

          Show
          stack added a comment - v3 of parallel scanner. Includes Dan's fix suggested above. Also It fixes issues with the logic that determines if a scan is interested in a region.
          Hide
          stack added a comment -

          More fixes and more tests.

          Show
          stack added a comment - More fixes and more tests.
          Hide
          stack added a comment -

          I'll commit this soon unless objection.

          Show
          stack added a comment - I'll commit this soon unless objection.
          Hide
          stack added a comment -

          Moving to 0.21. Lets look at merging this functionality back up into HTable rather than have it in a class of its own. Also, consider doing this functionality in coprocessors if it makes sense.

          Show
          stack added a comment - Moving to 0.21. Lets look at merging this functionality back up into HTable rather than have it in a class of its own. Also, consider doing this functionality in coprocessors if it makes sense.
          Hide
          stack added a comment -

          Moved from 0.21 to 0.22 just after merge of old 0.20 branch into TRUNK.

          Show
          stack added a comment - Moved from 0.21 to 0.22 just after merge of old 0.20 branch into TRUNK.
          Hide
          Otis Gospodnetic added a comment -

          I'm wondering what sort of a speed improvement one can expect from parallel scans? I know there is no universal answer, but if anyone has used this, I'd love to get the feeling for this. Thanks.

          Show
          Otis Gospodnetic added a comment - I'm wondering what sort of a speed improvement one can expect from parallel scans? I know there is no universal answer, but if anyone has used this, I'd love to get the feeling for this. Thanks.
          Hide
          stack added a comment -

          @Otis I'd imagine parallel scanning would make for some very nice speed improvement especially in case where you have a pretty severe filter running serverside that skips most rows returning a few only. The patch attached has a bit of a wonky history. I'll spare you the details. Its way stale too I'd say, I haven't tried it, and may depend on behavior on server side since squashed (again, haven't verified). Also, the support for parallel scanning should be moved into HTable I'd say (as someone above says) rather than have it out in a new ParallelHTable class. The patch has very nice tests though.

          Show
          stack added a comment - @Otis I'd imagine parallel scanning would make for some very nice speed improvement especially in case where you have a pretty severe filter running serverside that skips most rows returning a few only. The patch attached has a bit of a wonky history. I'll spare you the details. Its way stale too I'd say, I haven't tried it, and may depend on behavior on server side since squashed (again, haven't verified). Also, the support for parallel scanning should be moved into HTable I'd say (as someone above says) rather than have it out in a new ParallelHTable class. The patch has very nice tests though.
          Hide
          stack added a comment -

          Moving out of 0.92.0. Pull it back in if you think different.

          Show
          stack added a comment - Moving out of 0.92.0. Pull it back in if you think different.
          Hide
          Lars Hofhansl added a comment -

          I wonder if a better building block would to be able to submit a scan to a region via HTable.

          For example we have a need not necessarily for a parallel "serial" scan, but rather for a bunch of parallel scans that (via coprocessors) perform some aggregation and then perform a merge sort of the results at the client.
          And of course this can also be used for parallel serial scans in the case of highly selective filters.

          That would make for very small simple patch (management of threads, merging results, etc, would be application specific and not part of HBase).

          The user visible API could be something as simple as (on HTable[Interface]):
          ResultScanner getScanner(Scan, HRegionInfo)

          And maybe something like the ParallelScannerManager could be added as an example

          Show
          Lars Hofhansl added a comment - I wonder if a better building block would to be able to submit a scan to a region via HTable. For example we have a need not necessarily for a parallel "serial" scan, but rather for a bunch of parallel scans that (via coprocessors) perform some aggregation and then perform a merge sort of the results at the client. And of course this can also be used for parallel serial scans in the case of highly selective filters. That would make for very small simple patch (management of threads, merging results, etc, would be application specific and not part of HBase). The user visible API could be something as simple as (on HTable [Interface] ): ResultScanner getScanner(Scan, HRegionInfo) And maybe something like the ParallelScannerManager could be added as an example
          Hide
          Lars Hofhansl added a comment -

          Here's what I was thinking. Very simple, non-intrusive. Just an idea for much simpler patch that does not presume exact behavioral requirements.

          Actually I do not even see a strong reason why client scanners need to "live" inside HTable.
          The only HTable method used is getConnection() (which interestingly seems to be scheduled to be changed from public to protected or package scope).

          If getConnection remains public, together with ServerCallable, one can write parallel (or any kind of) scanners without changing HBase code.

          Show
          Lars Hofhansl added a comment - Here's what I was thinking. Very simple, non-intrusive. Just an idea for much simpler patch that does not presume exact behavioral requirements. Actually I do not even see a strong reason why client scanners need to "live" inside HTable. The only HTable method used is getConnection() (which interestingly seems to be scheduled to be changed from public to protected or package scope). If getConnection remains public, together with ServerCallable, one can write parallel (or any kind of) scanners without changing HBase code.
          Hide
          Andrew Purtell added a comment -

          Actually I do not even see a strong reason why client scanners need to "live" inside HTable.

          +1 Perhaps we can deprecate those methods and move scanners out to their own package and/or class.

          Show
          Andrew Purtell added a comment - Actually I do not even see a strong reason why client scanners need to "live" inside HTable. +1 Perhaps we can deprecate those methods and move scanners out to their own package and/or class.
          Hide
          stack added a comment -

          The change of getConnection visibility was just house-cleaning tightening of public API. Seems like you are making case that it stay public. If so, lets fix that before we go to much further w/ 0.92 branch.

          I'm grand w/ this patch. Small change that would facilitate exotic behaviors.

          Show
          stack added a comment - The change of getConnection visibility was just house-cleaning tightening of public API. Seems like you are making case that it stay public. If so, lets fix that before we go to much further w/ 0.92 branch. I'm grand w/ this patch. Small change that would facilitate exotic behaviors.
          Hide
          Lars Hofhansl added a comment -

          I've since realized that HConnectionManager.getConnection(Configuration) can be used outside of HTable as well, so HTable.getConnection does not have to be public.

          Show
          Lars Hofhansl added a comment - I've since realized that HConnectionManager.getConnection(Configuration) can be used outside of HTable as well, so HTable.getConnection does not have to be public.
          Hide
          Lars Hofhansl added a comment -

          Created HBASE-4439.

          Thinking about this issue a bit more. The patch-idea I provided is actually superfluous.

          The same can be achieved, by just passing a Scan with start/endRow set to a Regions start/endRow to HTable.getScanner(Scan), this also has the added benefit of dealing with concurrent splits.

          That is also how one could implement a parallel scanner with much less effort.

          Show
          Lars Hofhansl added a comment - Created HBASE-4439 . Thinking about this issue a bit more. The patch-idea I provided is actually superfluous. The same can be achieved, by just passing a Scan with start/endRow set to a Regions start/endRow to HTable.getScanner(Scan), this also has the added benefit of dealing with concurrent splits. That is also how one could implement a parallel scanner with much less effort.
          Hide
          James Taylor added a comment -

          Would it be possible to revive this JIRA? It'd be nice to have a ParallelClientScanner that you can hand a list of Scans to execute in parallel.

          Show
          James Taylor added a comment - Would it be possible to revive this JIRA? It'd be nice to have a ParallelClientScanner that you can hand a list of Scans to execute in parallel.
          Hide
          stack added a comment -

          James Taylor it is a little silly we don't do this currently. I'd say the original patch is well stale and we should probably start over. Anyone you know willing to work on this?

          Show
          stack added a comment - James Taylor it is a little silly we don't do this currently. I'd say the original patch is well stale and we should probably start over. Anyone you know willing to work on this?
          Hide
          Raymond Liu added a comment -

          Interesting topic, which part of the issue is most difficult? I am sort of interesting on this issue but not sure I have the adequate knowledge.

          Show
          Raymond Liu added a comment - Interesting topic, which part of the issue is most difficult? I am sort of interesting on this issue but not sure I have the adequate knowledge.

            People

            • Assignee:
              Unassigned
              Reporter:
              stack
            • Votes:
              9 Vote for this issue
              Watchers:
              33 Start watching this issue

              Dates

              • Created:
                Updated:

                Development