HBase
  1. HBase
  2. HBASE-74

[performance] When a get or scan request spans multiple columns, execute the reads in parallel

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.95.2
    • Fix Version/s: None
    • Component/s: regionserver
    • Labels:
      None

      Description

      When a get or scan request spans multiple columns, execute the reads in parallel and use a CountDownLatch to wait for them to complete before returning the results.

        Issue Links

          Activity

          Hide
          Bryan Duxbury added a comment -

          Has anyone tested something like this for performance? It might be a substantial boost.

          Show
          Bryan Duxbury added a comment - Has anyone tested something like this for performance? It might be a substantial boost.
          Hide
          stack added a comment -

          Putting into 0.19. Might be easy to do.

          Show
          stack added a comment - Putting into 0.19. Might be easy to do.
          Hide
          stack added a comment -

          Moving out of 0.19.0. Nice feature, but need to do a nice job on it (Eitan at pset can help with this).

          Show
          stack added a comment - Moving out of 0.19.0. Nice feature, but need to do a nice job on it (Eitan at pset can help with this).
          Hide
          stack added a comment -

          Upping priority because its performance. If many families, this could help. We need to talk to our Eitan. He has written a paper on when to parallelize based on pset frontend experience.

          Show
          stack added a comment - Upping priority because its performance. If many families, this could help. We need to talk to our Eitan. He has written a paper on when to parallelize based on pset frontend experience.
          Hide
          Erik Holstad added a comment -

          When working on HBASE-1249 this thought came to my mind so I tried to design the new system so that it would be pretty easy to add this.
          There are still a few things that need to be done to make this work properly and I haven't ran any test to see how much we would gain.

          For making things parallel there are a couple of places where that can be done.
          If start query is a [Get] then we can make things in parallel in multiple places:
          HRegionServer: Every get can be done in parallel.
          HRegion : Every family in the get can be done in parallel
          HStore : Every read from memCache + storefiles can be done in parallel.

          Starting at the bottom to support parallel computation of data.
          HStore :
          So you have a nunch of lists that you need to compare, they are:
          1. Data list in sf, storefile
          2. The get list, with families and columns to look for
          3. The result
          4. The deletes from previous sf
          5. The deletes from this read.
          Data, get, result, oldDeletes, newDeletes.

          With current layout where puts and deletes are mixed you can :
          1. Compare the data in the different sf with the get and create a list of
          candidates and a list of new deletes for that sf. The compare includes checks
          for TimeRange, TTL and number of versions.

          2. Merge deletes one by one starting at memCache and moving down the sfs. For
          every merge you send that new delete list into the serverGet it belongs to and
          move on to the merge with next new delete list.

          3. When all delete checks are done you are left with your candidate lists from
          all the sfs, they now needs to be merged and checked for number of versions.

          So you have:
          1. GetCandidates and new deletes
          2. Merge deletes and check sgets towards the merged deletes
          3. Merge candidates

          For parallel you have a list of sget with the same data, sgets

          // This call can be threaded
          1. sget.createCandidates(List<KeyValue> data, boolean multiFamily)

          2. for(int i=0; i<sgets.length; i++)

          { oldDeletes = mergeDeletes(oldDeletes, sgets[i].getDeletes()) // This call can be threaded sget[i].compareDeletes(oldDeletes) }

          3. result = mergeCandidates(list<sget>)

          Doing this can probably increase speed for a lot of cases, but I think that it will have the biggest impact on the GetFamilies query, before getFull, since you for that query need to look in all the storefiles anyways, which might not be the case for other queries.

          I think that it would be too hard to thread the gets from different families, specially now that we don't need to sort the result on the client side but can just append it to the list.
          Threading multiple gets shouldn't be too hard either.

          Show
          Erik Holstad added a comment - When working on HBASE-1249 this thought came to my mind so I tried to design the new system so that it would be pretty easy to add this. There are still a few things that need to be done to make this work properly and I haven't ran any test to see how much we would gain. For making things parallel there are a couple of places where that can be done. If start query is a [Get] then we can make things in parallel in multiple places: HRegionServer: Every get can be done in parallel. HRegion : Every family in the get can be done in parallel HStore : Every read from memCache + storefiles can be done in parallel. Starting at the bottom to support parallel computation of data. HStore : So you have a nunch of lists that you need to compare, they are: 1. Data list in sf, storefile 2. The get list, with families and columns to look for 3. The result 4. The deletes from previous sf 5. The deletes from this read. Data, get, result, oldDeletes, newDeletes. With current layout where puts and deletes are mixed you can : 1. Compare the data in the different sf with the get and create a list of candidates and a list of new deletes for that sf. The compare includes checks for TimeRange, TTL and number of versions. 2. Merge deletes one by one starting at memCache and moving down the sfs. For every merge you send that new delete list into the serverGet it belongs to and move on to the merge with next new delete list. 3. When all delete checks are done you are left with your candidate lists from all the sfs, they now needs to be merged and checked for number of versions. So you have: 1. GetCandidates and new deletes 2. Merge deletes and check sgets towards the merged deletes 3. Merge candidates For parallel you have a list of sget with the same data, sgets // This call can be threaded 1. sget.createCandidates(List<KeyValue> data, boolean multiFamily) 2. for(int i=0; i<sgets.length; i++) { oldDeletes = mergeDeletes(oldDeletes, sgets[i].getDeletes()) // This call can be threaded sget[i].compareDeletes(oldDeletes) } 3. result = mergeCandidates(list<sget>) Doing this can probably increase speed for a lot of cases, but I think that it will have the biggest impact on the GetFamilies query, before getFull, since you for that query need to look in all the storefiles anyways, which might not be the case for other queries. I think that it would be too hard to thread the gets from different families, specially now that we don't need to sort the result on the client side but can just append it to the list. Threading multiple gets shouldn't be too hard either.
          Hide
          stack added a comment -

          Moving it out (unless someone fixes it meantime).

          Show
          stack added a comment - Moving it out (unless someone fixes it meantime).
          Hide
          stack added a comment -

          Just to say that some lads embedding hbase need this one bad; assigning myself.

          Show
          stack added a comment - Just to say that some lads embedding hbase need this one bad; assigning myself.
          Hide
          Zlatin Balevsky added a comment -

          > HRegionServer: Every get can be done in parallel.
          > HRegion : Every family in the get can be done in parallel
          > HStore : Every read from memCache + storefiles can be done in parallel.

          Since HBASE-1935 is addressing paralellism at HRegionServer level, is this issue going to tackle both HRegion and HStore levels?

          Show
          Zlatin Balevsky added a comment - > HRegionServer: Every get can be done in parallel. > HRegion : Every family in the get can be done in parallel > HStore : Every read from memCache + storefiles can be done in parallel. Since HBASE-1935 is addressing paralellism at HRegionServer level, is this issue going to tackle both HRegion and HStore levels?
          Hide
          stack added a comment -

          @Zlatin Yes. You've scoped this issue well.

          Show
          stack added a comment - @Zlatin Yes. You've scoped this issue well.
          Hide
          Jonathan Gray added a comment -

          Performance, punting to 0.22, can bring it back if someone implements it

          Show
          Jonathan Gray added a comment - Performance, punting to 0.22, can bring it back if someone implements it
          Hide
          Lars George added a comment -

          Stack,

          > HStore : Every read from memCache + storefiles can be done in parallel.

          How is that going to work with the ScanQueryMatcher? I mean, the initial seek() I can understand, but then you have to scan the stores by age starting with the MemStore, moving to the on-disk Stores. How can we parallelize that? Just curious.

          Show
          Lars George added a comment - Stack, > HStore : Every read from memCache + storefiles can be done in parallel. How is that going to work with the ScanQueryMatcher? I mean, the initial seek() I can understand, but then you have to scan the stores by age starting with the MemStore, moving to the on-disk Stores. How can we parallelize that? Just curious.
          Hide
          stack added a comment -

          @Lars Don't we do for (family: families) get/scan business now? I was thinking if families > 1, then we'd run the get/scan business each in its own thread. When all threads were done we'd have to aggregate the result before return?

          Show
          stack added a comment - @Lars Don't we do for (family: families) get/scan business now? I was thinking if families > 1, then we'd run the get/scan business each in its own thread. When all threads were done we'd have to aggregate the result before return?
          Hide
          Jonathan Gray added a comment -

          You could potentially load the first block in each file in parallel. With any kind of concurrency on the server, it's unclear to me that this would be a win though.

          Families in parallel seems reasonable as a configurable option, but again, under concurrency this parallelism will have diminishing returns.

          Show
          Jonathan Gray added a comment - You could potentially load the first block in each file in parallel. With any kind of concurrency on the server, it's unclear to me that this would be a win though. Families in parallel seems reasonable as a configurable option, but again, under concurrency this parallelism will have diminishing returns.
          Hide
          stack added a comment -

          Moving down from critical for 0.92 to major.

          Show
          stack added a comment - Moving down from critical for 0.92 to major.
          Hide
          stack added a comment -

          Moving out of 0.92.0. Pull it back in if you think different.

          Show
          stack added a comment - Moving out of 0.92.0. Pull it back in if you think different.
          Hide
          Otis Gospodnetic added a comment -

          Let's dig out this 5+ years old issue with last comment from 4.5+ years ago
          Maybe this was actually implemented by now?

          Show
          Otis Gospodnetic added a comment - Let's dig out this 5+ years old issue with last comment from 4.5+ years ago Maybe this was actually implemented by now?
          Hide
          Sergey Shelukhin added a comment -

          If not, HBASE-5416 is in the same area/might be related

          Show
          Sergey Shelukhin added a comment - If not, HBASE-5416 is in the same area/might be related
          Hide
          stack added a comment -

          To my knowledge, this issue has yet to be tackled.

          Show
          stack added a comment - To my knowledge, this issue has yet to be tackled.
          Hide
          Otis Gospodnetic added a comment -

          @stack & Sergey Shelukhin - thanks. I linked HBASE-5416, but thought it would also be good to set Fix Version to 0.96 so this issue gets some visibility - seems popular in terms of votes and watchers and that HBASE-5416 is also set for 0.96. However, I don't seem to have enough HBase JIRA karma for this, so if you think setting Fix Version would make sense, could you please do it?

          Show
          Otis Gospodnetic added a comment - @stack & Sergey Shelukhin - thanks. I linked HBASE-5416 , but thought it would also be good to set Fix Version to 0.96 so this issue gets some visibility - seems popular in terms of votes and watchers and that HBASE-5416 is also set for 0.96. However, I don't seem to have enough HBase JIRA karma for this, so if you think setting Fix Version would make sense, could you please do it?
          Hide
          Sergey Shelukhin added a comment -

          Set. Thanks!

          Show
          Sergey Shelukhin added a comment - Set. Thanks!
          Hide
          stack added a comment -

          Unassigned new feature moved out of 0.95

          Show
          stack added a comment - Unassigned new feature moved out of 0.95
          Hide
          Anoop Sam John added a comment -

          I remember this is done in Trunk as part of some other issue. Not remembering the issue id. Ted Yu you know? May be we can close this issue as duplicate now?

          Show
          Anoop Sam John added a comment - I remember this is done in Trunk as part of some other issue. Not remembering the issue id. Ted Yu you know? May be we can close this issue as duplicate now?

            People

            • Assignee:
              Unassigned
              Reporter:
              Jim Kellerman
            • Votes:
              5 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:

                Development