HBase
  1. HBase
  2. HBASE-2038

Coprocessors: Region level indexing

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Later
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Coprocessors
    • Labels:
      None

      Description

      HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design – there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality.

        Issue Links

          Activity

          Hide
          Andrew Purtell added a comment -

          This issue should now really be about converting IHBase and THBase into coprocessors at some point.

          Show
          Andrew Purtell added a comment - This issue should now really be about converting IHBase and THBase into coprocessors at some point.
          Hide
          Alex Baranau added a comment -

          Hello,

          As first cut of Coprocessors (CP) implementation has been committed to trunk (HBASE-2001 and HBASE-2002) I think there's a good opportunity to get going with this issue. I believe it's a good time for this effort and hope that CP-based implementation of region-level indexing will confirm that CP API is complete and has all one might need (for now).

          I revised the design/approach of the IHBase contrib and have several questions to ask with regard to transforming the code based on CPs. It would be great if someone can help me with them!

          1) Are coprocessors meant to be stateless? If not, then I assume that one instance is created and "assigned" to a region and that CP implementation should be thread-safe (e.g. multiple scanners can be handled at the same time for the regions). Otherwise, if coprocessors are meant to be stateless, I believe that CoprocessorEnvironment's get/put/remove methods are used to store intermediate data (aka attributes) between method calls (if we really need it). Is CoprocessorEnvironment instance is created one-per-region? I know, e.g. I can store some scan-related data using scanId passed to the scan-related callbacks (is it safe?), but what about region-related data (no problem with it in case cp env is one-per-region)?
          In general, do I understand the CP's API correctly (based on assumptions I share in this point)?

          2) During batch scan (smth which was added in trunk but wasn't supported in previous HBase versions, and hence current IHBase implementation doesn't take it into account) we need to return multiple rows from scan's next() method. It looks like if we apply current approach (from current IHBase implementation) of "fast forwarding" to next value we'll only fastforward scan to the first value of those to return. Others will be fetched using "usual" scan logic without using index which isn't efficient. There's not a lot we can do without changing scan (and deeper) code. Am I right here? Perhaps it's ok to have a lack of support for batch reads for the first version of CP-based IHBase? Or, it might me that we should change the approach?

          3) Is it in general a good idea to take this initiave (transform IHBase implementation to CP-based one) by me? I fear that it might be that due to a lot of changes in HBase codebase (trunk versus e.g. 0.20.5) there are going to be severe changes in approach/design of indices implementation (from the current one, which I could use as a base), so poking you guys (HBase devs) from my side a lot (if really needed) to learn things about it isn't very efficient way to work on this issue ? Anyways, I'd be glad to work on the issue if someone can provide needed guidance.

          4) Haven't dug into THBase contrib (as in IHBase). Are these contribs (IHBase and THBase) will be "transferred" to CP-based implementation as a single effort? I believe they won't be merged based on how differently they act now. Was it really meant to put the tasks for both into single JIRA issue?

          Thank you!

          Show
          Alex Baranau added a comment - Hello, As first cut of Coprocessors (CP) implementation has been committed to trunk ( HBASE-2001 and HBASE-2002 ) I think there's a good opportunity to get going with this issue. I believe it's a good time for this effort and hope that CP-based implementation of region-level indexing will confirm that CP API is complete and has all one might need (for now). I revised the design/approach of the IHBase contrib and have several questions to ask with regard to transforming the code based on CPs. It would be great if someone can help me with them! 1) Are coprocessors meant to be stateless? If not, then I assume that one instance is created and "assigned" to a region and that CP implementation should be thread-safe (e.g. multiple scanners can be handled at the same time for the regions). Otherwise, if coprocessors are meant to be stateless, I believe that CoprocessorEnvironment's get/put/remove methods are used to store intermediate data (aka attributes) between method calls (if we really need it). Is CoprocessorEnvironment instance is created one-per-region? I know, e.g. I can store some scan-related data using scanId passed to the scan-related callbacks (is it safe?), but what about region-related data (no problem with it in case cp env is one-per-region)? In general, do I understand the CP's API correctly (based on assumptions I share in this point)? 2) During batch scan (smth which was added in trunk but wasn't supported in previous HBase versions, and hence current IHBase implementation doesn't take it into account) we need to return multiple rows from scan's next() method. It looks like if we apply current approach (from current IHBase implementation) of "fast forwarding" to next value we'll only fastforward scan to the first value of those to return. Others will be fetched using "usual" scan logic without using index which isn't efficient. There's not a lot we can do without changing scan (and deeper) code. Am I right here? Perhaps it's ok to have a lack of support for batch reads for the first version of CP-based IHBase? Or, it might me that we should change the approach? 3) Is it in general a good idea to take this initiave (transform IHBase implementation to CP-based one) by me? I fear that it might be that due to a lot of changes in HBase codebase (trunk versus e.g. 0.20.5) there are going to be severe changes in approach/design of indices implementation (from the current one, which I could use as a base), so poking you guys (HBase devs) from my side a lot (if really needed) to learn things about it isn't very efficient way to work on this issue ? Anyways, I'd be glad to work on the issue if someone can provide needed guidance. 4) Haven't dug into THBase contrib (as in IHBase). Are these contribs (IHBase and THBase) will be "transferred" to CP-based implementation as a single effort? I believe they won't be merged based on how differently they act now. Was it really meant to put the tasks for both into single JIRA issue? Thank you!
          Hide
          Alex Baranau added a comment -

          The next might be helpful. Some time ago I refactored a bit IHBase and extracted base interfaces to make easier develop any custom indexed implementation: IndexManager and IndexScannerContext (I just published code here: https://github.com/abaranau/ihbase). To give an idea, here's how they look:

          public abstract class IndexManager implements HeapSize {
            public abstract void initialize(IdxRegion region);
            public abstract Pair<Long, Callable<Void>> rebuildIndexes() throws IOException;
            public abstract void cleanup();
            public abstract IndexScannerContext newIndexScannerContext(Scan scan) throws IOException;
          }
          
          public interface IndexScannerContext {
            KeyValue getNextKey();
            void close();
          }
          

          This refactored code somehow correlates with BaseRegionObserverCoprocessor API (at least in my head):

          • IndexManager's initialize() should be invoked at region open time,
          • rebuildIndexes() during flush,
          • cleanup() during region close,
          • IndexScannerContext should be created during open scan,
          • it's getNextKey() (with a bit of addional code) during scan's next()
          • and finally, close() when scan is closed in region.

          I'm not saying we should use this refactored version of code, I'm just putting it here for better "visualization purposes", just as a way to express the idea.

          Please, correct my logic where needed!
          Thanks!

          P.S. Please don't judge the names of classes, other dirty pieces in the refactored version I've shared, I wanted to a) just try to add aditional abstraction to be able to inject custom indexing implementation and b) make as little changes in IHBase codebase as I can so that others can follow them easily. Copypasting notes that I took during refactoring (may be helpful if someone wants to go inside the code):

          "1. Extracted interface IndexManager from IdxRegionIndexManager.
          2. Extracted separate IdxRegionIndexManagerMBean from IdxRegionMbean with IndexManager-implementation-specific info
          3. Created IndexScannerContext interface (with IdxScannerContext implementation, which encapsulates idxSearchContext and matchedExpressionIterator for existing code) which performs iteration over indexed expression keys.

          NOTE: Didn't think about renaming class IdxRegionIndexManager and related.
          NOTE: Didn't think about "repackaging" things.
          NOTE: New code/classes lack javadocs, will add them
          NOTE: Unit-tests should be added with regard to refactoring (add check IdxRegionIndexManagerMBean values at least)"

          Show
          Alex Baranau added a comment - The next might be helpful. Some time ago I refactored a bit IHBase and extracted base interfaces to make easier develop any custom indexed implementation: IndexManager and IndexScannerContext (I just published code here: https://github.com/abaranau/ihbase ). To give an idea, here's how they look: public abstract class IndexManager implements HeapSize { public abstract void initialize(IdxRegion region); public abstract Pair<Long, Callable<Void>> rebuildIndexes() throws IOException; public abstract void cleanup(); public abstract IndexScannerContext newIndexScannerContext(Scan scan) throws IOException; } public interface IndexScannerContext { KeyValue getNextKey(); void close(); } This refactored code somehow correlates with BaseRegionObserverCoprocessor API (at least in my head): IndexManager's initialize() should be invoked at region open time, rebuildIndexes() during flush, cleanup() during region close, IndexScannerContext should be created during open scan, it's getNextKey() (with a bit of addional code) during scan's next() and finally, close() when scan is closed in region. I'm not saying we should use this refactored version of code, I'm just putting it here for better "visualization purposes", just as a way to express the idea. Please, correct my logic where needed! Thanks! P.S. Please don't judge the names of classes, other dirty pieces in the refactored version I've shared, I wanted to a) just try to add aditional abstraction to be able to inject custom indexing implementation and b) make as little changes in IHBase codebase as I can so that others can follow them easily. Copypasting notes that I took during refactoring (may be helpful if someone wants to go inside the code): "1. Extracted interface IndexManager from IdxRegionIndexManager. 2. Extracted separate IdxRegionIndexManagerMBean from IdxRegionMbean with IndexManager-implementation-specific info 3. Created IndexScannerContext interface (with IdxScannerContext implementation, which encapsulates idxSearchContext and matchedExpressionIterator for existing code) which performs iteration over indexed expression keys. NOTE: Didn't think about renaming class IdxRegionIndexManager and related. NOTE: Didn't think about "repackaging" things. NOTE: New code/classes lack javadocs, will add them NOTE: Unit-tests should be added with regard to refactoring (add check IdxRegionIndexManagerMBean values at least)"
          Hide
          Gary Helmling added a comment -

          Hi Alex,

          I'm not familiar with the internal IHBase code, but I'll provide whatever info I can on coprocessors.

          1) Are coprocessors meant to be stateless? If not, then I assume that one instance is created and "assigned" to a region and that CP implementation should be thread-safe (e.g. multiple scanners can be handled at the same time for the regions).

          No, coprocessor implementations do not need to be stateless. If anything you'll need state for many interesting applications. A single coprocessor instance is created per configured coprocessor implementation on region load. You can treat the postOpen() and preClose() methods as init() and destroy() methods in your implementation. And yes, coprocessor implementations need to be thread safe.

          2) During batch scan (smth which was added in trunk but wasn't supported in previous HBase versions, and hence current IHBase implementation doesn't take it into account) we need to return multiple rows from scan's next() method. It looks like if we apply current approach (from current IHBase implementation) of "fast forwarding" to next value we'll only fastforward scan to the first value of those to return.

          Sorry, I'm not familiar with how IHBase handles this or what changed in the scanner API, but I'm guessing RegionObserver.preScannerNext() does provide much help in this fast-forwarding use case. It seems like this would need much deeper hooks into HRegion.RegionScanner to interact with the positioning code. Alternately, you could expose your own "indexed scanner" functionality via the dynamic rpc hooks (HTable.coprocessorExec()), but that would require the client to differentiate on indexed vs. non-indexed usage and doesn't provide the transparency you're looking for.

          3) Is it in general a good idea to take this initiave (transform IHBase implementation to CP-based one) by me?

          Sorry again, I don't have much of an answer on this one. I'll help on anything I can on the coprocessor side of things, though!

          Show
          Gary Helmling added a comment - Hi Alex, I'm not familiar with the internal IHBase code, but I'll provide whatever info I can on coprocessors. 1) Are coprocessors meant to be stateless? If not, then I assume that one instance is created and "assigned" to a region and that CP implementation should be thread-safe (e.g. multiple scanners can be handled at the same time for the regions). No, coprocessor implementations do not need to be stateless. If anything you'll need state for many interesting applications. A single coprocessor instance is created per configured coprocessor implementation on region load. You can treat the postOpen() and preClose() methods as init() and destroy() methods in your implementation. And yes, coprocessor implementations need to be thread safe. 2) During batch scan (smth which was added in trunk but wasn't supported in previous HBase versions, and hence current IHBase implementation doesn't take it into account) we need to return multiple rows from scan's next() method. It looks like if we apply current approach (from current IHBase implementation) of "fast forwarding" to next value we'll only fastforward scan to the first value of those to return. Sorry, I'm not familiar with how IHBase handles this or what changed in the scanner API, but I'm guessing RegionObserver.preScannerNext() does provide much help in this fast-forwarding use case. It seems like this would need much deeper hooks into HRegion.RegionScanner to interact with the positioning code. Alternately, you could expose your own "indexed scanner" functionality via the dynamic rpc hooks (HTable.coprocessorExec()), but that would require the client to differentiate on indexed vs. non-indexed usage and doesn't provide the transparency you're looking for. 3) Is it in general a good idea to take this initiave (transform IHBase implementation to CP-based one) by me? Sorry again, I don't have much of an answer on this one. I'll help on anything I can on the coprocessor side of things, though!
          Hide
          Andrew Purtell added a comment -

          Alex,

          Is it in general a good idea to take this initiave (transform IHBase implementation to CP-based one) by me?

          Well I for one definitely think this is a good idea.

          I believe it's a good time for this effort and hope that CP-based implementation of region-level indexing will confirm that CP API is complete and has all one might need (for now).

          As do I. However there are additions and improvements to the CP API coming, see HBASE-3256 and HBASE-3257. The latter especially may be relevant.

          I assume that one instance is created and "assigned" to a region and that CP implementation should be thread-safe (e.g. multiple scanners can be handled at the same time for the regions).

          Correct.

          I believe that CoprocessorEnvironment's get/put/remove methods are used to store intermediate data (aka attributes) between method calls (if we really need it).

          Correct, but see next answer below.

          Is CoprocessorEnvironment instance is created one-per-region?

          No. This is created once per coprocessor. Each coprocessor has its own environment, which can be used to keep state between multiple threads of a coprocessor attached to one region, but not between multiple coprocessors.

          CoprocessorHost is the object that is created once per region.

          I can store some scan-related data using scanId passed to the scan-related callbacks (is it safe?)

          This should be safe.

          I don't understand the current IHBase implementation enough to answer your question #2.

          This refactored code somehow correlates with BaseRegionObserverCoprocessor API (at least in my head)

          I think that is a great start.

          Show
          Andrew Purtell added a comment - Alex, Is it in general a good idea to take this initiave (transform IHBase implementation to CP-based one) by me? Well I for one definitely think this is a good idea. I believe it's a good time for this effort and hope that CP-based implementation of region-level indexing will confirm that CP API is complete and has all one might need (for now). As do I. However there are additions and improvements to the CP API coming, see HBASE-3256 and HBASE-3257 . The latter especially may be relevant. I assume that one instance is created and "assigned" to a region and that CP implementation should be thread-safe (e.g. multiple scanners can be handled at the same time for the regions). Correct. I believe that CoprocessorEnvironment's get/put/remove methods are used to store intermediate data (aka attributes) between method calls (if we really need it). Correct, but see next answer below. Is CoprocessorEnvironment instance is created one-per-region? No. This is created once per coprocessor. Each coprocessor has its own environment, which can be used to keep state between multiple threads of a coprocessor attached to one region, but not between multiple coprocessors. CoprocessorHost is the object that is created once per region. I can store some scan-related data using scanId passed to the scan-related callbacks (is it safe?) This should be safe. I don't understand the current IHBase implementation enough to answer your question #2. This refactored code somehow correlates with BaseRegionObserverCoprocessor API (at least in my head) I think that is a great start.
          Hide
          Lars Hofhansl added a comment -

          @Alex: Are you still planning to work on this?

          Show
          Lars Hofhansl added a comment - @Alex: Are you still planning to work on this?
          Hide
          Alex Baranau added a comment -

          Hi. Sorry for abandoning this issue for so long. I'd love to still work on it, but I think I'll manage to dedicate enough time to it only in 2-3 weeks. I plan to do some progress in it starting from next week.
          If I couldn't do anything in this time then someone could take over it, will not hold anymore.

          Is that OK? Or is there someone who wants to work on it right away?

          Show
          Alex Baranau added a comment - Hi. Sorry for abandoning this issue for so long. I'd love to still work on it, but I think I'll manage to dedicate enough time to it only in 2-3 weeks. I plan to do some progress in it starting from next week. If I couldn't do anything in this time then someone could take over it, will not hold anymore. Is that OK? Or is there someone who wants to work on it right away?
          Hide
          Lars Hofhansl added a comment -

          Maybe there's a way to collaborate on this (although I cannot promise much of my time on this either).

          From gleaming HBASE-2037 this would not requite ITHBase and the indexes would be always consistent, correct? I have not looked at the 2.5mb patch in HBASE-2037, yet...
          I was wondering if there's a summary somewhere about how it works. Specifically how does a client know where to look for an indexed value?

          Show
          Lars Hofhansl added a comment - Maybe there's a way to collaborate on this (although I cannot promise much of my time on this either). From gleaming HBASE-2037 this would not requite ITHBase and the indexes would be always consistent, correct? I have not looked at the 2.5mb patch in HBASE-2037 , yet... I was wondering if there's a summary somewhere about how it works. Specifically how does a client know where to look for an indexed value?
          Hide
          Ted Yu added a comment -

          There're already two projects building secondary index on top of HBase: Lily and Culvert.

          If we provide native secondary indexing support in HBase, we should evaluate what ITHBase, Lily and Culvert have done so that the native support gives better abstraction.

          Show
          Ted Yu added a comment - There're already two projects building secondary index on top of HBase: Lily and Culvert. If we provide native secondary indexing support in HBase, we should evaluate what ITHBase, Lily and Culvert have done so that the native support gives better abstraction.
          Hide
          Alex Baranau added a comment -

          +1 for collaboration!
          re HBASE-2037 (aka IHBase): it is the base for this effort. Yes, it doesn't require ITHBase, it is alternative implementation. The refactored code I pointed above (https://github.com/abaranau/ihbase) is also based on the IHBase code.

          In short (sorry, the description is not tied to classes, don't have them in front of me currently):

          As far as I remember (need to refresh my memory though) the point is that index is being kept for each Region, it is loaded in RAM, not persistent. It is built during Region initialization (after HBase restart or new region creation after split and such). When scan is performed with indexed columns involved it uses the index when finding the next record to navigate to and fast forwards to this next record (usually by skipping some other records without even reading them). This is where it wins the speed.

          As this was developed before CPs were added the special API was developed which is being used by client.

          Hope this helps a bit. I will refresh my memory from the code and we'll discuss that a bit deeper.

          Show
          Alex Baranau added a comment - +1 for collaboration! re HBASE-2037 (aka IHBase): it is the base for this effort. Yes, it doesn't require ITHBase, it is alternative implementation. The refactored code I pointed above ( https://github.com/abaranau/ihbase ) is also based on the IHBase code. In short (sorry, the description is not tied to classes, don't have them in front of me currently): As far as I remember (need to refresh my memory though) the point is that index is being kept for each Region, it is loaded in RAM, not persistent. It is built during Region initialization (after HBase restart or new region creation after split and such). When scan is performed with indexed columns involved it uses the index when finding the next record to navigate to and fast forwards to this next record (usually by skipping some other records without even reading them). This is where it wins the speed. As this was developed before CPs were added the special API was developed which is being used by client. Hope this helps a bit. I will refresh my memory from the code and we'll discuss that a bit deeper.
          Hide
          Alex Baranau added a comment -

          Re comparing this solution to others:

          ITHBase: you can find very good comparison in HBASE-2037 description
          Lily: apart from different implementation base (Lily uses Lucene while IHBase currently not) IHBase is more like "internal" feature added HBase, not a separate/additional tool/system. I.e. provides in-house support for secondary indexing.
          Culvert: haven't looked into the internals yet (definitely will), but it looks like this may be smth similar. Not sure

          +1 for doing evaluating and learn from other's experience use-cases to make "native support" (IHBase-2) better.

          Show
          Alex Baranau added a comment - Re comparing this solution to others: ITHBase: you can find very good comparison in HBASE-2037 description Lily: apart from different implementation base (Lily uses Lucene while IHBase currently not) IHBase is more like "internal" feature added HBase, not a separate/additional tool/system. I.e. provides in-house support for secondary indexing. Culvert: haven't looked into the internals yet (definitely will), but it looks like this may be smth similar. Not sure +1 for doing evaluating and learn from other's experience use-cases to make "native support" (IHBase-2) better.
          Hide
          Lars Hofhansl added a comment -

          I see. So if I wanted to do a Get and all I have is the value of an indexed column, I have to ask all regions, correct? (Because there is no way to identify the region with the column value ahead of time)

          Show
          Lars Hofhansl added a comment - I see. So if I wanted to do a Get and all I have is the value of an indexed column, I have to ask all regions, correct? (Because there is no way to identify the region with the column value ahead of time)
          Hide
          Alex Baranau added a comment -

          I think that IHBase implementation (current one) implies that. For scans this should not be a big problem though - regions are usually configured to be big and with the help of index the whole region is skipped at once. If we are talking about random single read (Get) operations then this may mean a lot of "useless" work (comparing to amount of useful work).

          Do you have a specific use-case (real one or just the on "in mind") you want to discuss? If so, may be it makes sense to discuss on ML or even in chat (Skype can work).

          Show
          Alex Baranau added a comment - I think that IHBase implementation (current one) implies that. For scans this should not be a big problem though - regions are usually configured to be big and with the help of index the whole region is skipped at once. If we are talking about random single read (Get) operations then this may mean a lot of "useless" work (comparing to amount of useful work). Do you have a specific use-case (real one or just the on "in mind") you want to discuss? If so, may be it makes sense to discuss on ML or even in chat (Skype can work).
          Hide
          Lars Hofhansl added a comment -

          I have no particular use case... Just that I have been trying to work out how I would do 2nd-ary indexes in HBase and I always came back to this (either it needs some cross region transaction when maintaining the index, or you need to ask all regions when you query the index).

          Not a problem as such. Just confirming.
          Such a get could be farmed off in parallel to manu regions, so latency is not necessarily bad.
          I'd be up for an offline chat. I'll send an email.

          Show
          Lars Hofhansl added a comment - I have no particular use case... Just that I have been trying to work out how I would do 2nd-ary indexes in HBase and I always came back to this (either it needs some cross region transaction when maintaining the index, or you need to ask all regions when you query the index). Not a problem as such. Just confirming. Such a get could be farmed off in parallel to manu regions, so latency is not necessarily bad. I'd be up for an offline chat. I'll send an email.
          Hide
          Anoop Sam John added a comment -

          Hi Lars,
          I am also trying for a secondary index and I have seen the IHBase concept being good.. But we need this to be moved to coprocessor based so that the kernel code of HBase need not be different for the secondary index. IHBase makes the scan go through all the regions ( as u said ) but they will skip and seek to the later positions in the heap avoid so many possible data read from HDFS etc...
          When I saw the current co processor, we call preScannerNext() from HRegionServer next(final long scannerId, int nbRows) and pass the RegionScanner here to the co processor. But as per the IHBase way, within the co processor we should be able to seek to the correct row where the indexed col val equals our value. But we can not do this as of now as RegionScanner seek() not there.

          Also this preScannerNext() will be called once before the actual next(final long scannerId, int nbRows) call happening on the region. Here as per the cache value at the client side the nbRows might be more than one. Now suppose this is nbRows=2 and in the region we have 2 rows one at some what in the middle part of an HFile and the other at another HFile. Now as per IHBase we should 1st seek to the 1st position of the row and after reading this data should seek to the next position. Now as per the current way of calling of preScannerNext() this wont be possible. So I think we might need some change in these area? What do u say?

          Mean while what is your plan to continue with the way of IHBase storing the index in memory for each of the region or some change in this?

          Show
          Anoop Sam John added a comment - Hi Lars, I am also trying for a secondary index and I have seen the IHBase concept being good.. But we need this to be moved to coprocessor based so that the kernel code of HBase need not be different for the secondary index. IHBase makes the scan go through all the regions ( as u said ) but they will skip and seek to the later positions in the heap avoid so many possible data read from HDFS etc... When I saw the current co processor, we call preScannerNext() from HRegionServer next(final long scannerId, int nbRows) and pass the RegionScanner here to the co processor. But as per the IHBase way, within the co processor we should be able to seek to the correct row where the indexed col val equals our value. But we can not do this as of now as RegionScanner seek() not there. Also this preScannerNext() will be called once before the actual next(final long scannerId, int nbRows) call happening on the region. Here as per the cache value at the client side the nbRows might be more than one. Now suppose this is nbRows=2 and in the region we have 2 rows one at some what in the middle part of an HFile and the other at another HFile. Now as per IHBase we should 1st seek to the 1st position of the row and after reading this data should seek to the next position. Now as per the current way of calling of preScannerNext() this wont be possible. So I think we might need some change in these area? What do u say? Mean while what is your plan to continue with the way of IHBase storing the index in memory for each of the region or some change in this?
          Hide
          Alex Baranau added a comment -

          Hi Anoop,

          I think your Q in second paragraph denotes the same concern that was stated in one of my first comments (a year ago..):

          2) During batch scan (smth which was added in trunk but wasn't supported in previous HBase versions, and hence current IHBase implementation doesn't take it into account) we need to return multiple rows from scan's next() method. It looks like if we apply current approach (from current IHBase implementation) of "fast forwarding" to next value we'll only fastforward scan to the first value of those to return. Others will be fetched using "usual" scan logic without using index which isn't efficient. There's not a lot we can do without changing scan (and deeper) code. Am I right here? Perhaps it's ok to have a lack of support for batch reads for the first version of CP-based IHBase? Or, it might me that we should change the approach?

          I'm reviewing the released 0.92 coprocessors capabilities, will share if I find how this issue can be solved. But from what I see so far this issue remained in a released version...

          Show
          Alex Baranau added a comment - Hi Anoop, I think your Q in second paragraph denotes the same concern that was stated in one of my first comments (a year ago..): 2) During batch scan (smth which was added in trunk but wasn't supported in previous HBase versions, and hence current IHBase implementation doesn't take it into account) we need to return multiple rows from scan's next() method. It looks like if we apply current approach (from current IHBase implementation) of "fast forwarding" to next value we'll only fastforward scan to the first value of those to return. Others will be fetched using "usual" scan logic without using index which isn't efficient. There's not a lot we can do without changing scan (and deeper) code. Am I right here? Perhaps it's ok to have a lack of support for batch reads for the first version of CP-based IHBase? Or, it might me that we should change the approach? I'm reviewing the released 0.92 coprocessors capabilities, will share if I find how this issue can be solved. But from what I see so far this issue remained in a released version...
          Hide
          Anoop Sam John added a comment -

          Hi Alex,
          Thanks for your reply... Yes I had seen your past comment..I am checking the trunk code for the co processor for this work as of now...

          What is your comment on my first comment, that the HRegionServer next(final long scannerId, int nbRows) calls the co processor preScannerNext() by passing the RegionScanner. On this we can not make a seek()..

          Thanks
          Anoop

          Show
          Anoop Sam John added a comment - Hi Alex, Thanks for your reply... Yes I had seen your past comment..I am checking the trunk code for the co processor for this work as of now... What is your comment on my first comment, that the HRegionServer next(final long scannerId, int nbRows) calls the co processor preScannerNext() by passing the RegionScanner. On this we can not make a seek().. Thanks Anoop
          Hide
          Lars Hofhansl added a comment -

          Unfortunately there is no seeking in the coprocessors, yet. They work more like a filter of a real scan. Seeking is done one level (or two actually) level deeper.
          Seeking is done in the StoreScanners, coprocessors see RegionScanners.

          It is not entirely clear to me where to hook this up in that API.

          It might be possible to provide a custom filter to do that. Filters operate at the storescanner level, and so can (and do) provide seek hints to the calling scanner.

          Show
          Lars Hofhansl added a comment - Unfortunately there is no seeking in the coprocessors, yet. They work more like a filter of a real scan. Seeking is done one level (or two actually) level deeper. Seeking is done in the StoreScanners, coprocessors see RegionScanners. It is not entirely clear to me where to hook this up in that API. It might be possible to provide a custom filter to do that. Filters operate at the storescanner level, and so can (and do) provide seek hints to the calling scanner.
          Hide
          Alex Baranau added a comment -

          Yeah, same thought here: use fast-forwarding filter (the one which provides hints for the next row to forward to). I guess we can add such filter to a scan as a hook in open scan (on region level) time.

          I guess ideally we'd want to add such filter for RegionScanner in postScannerOpen(). Though RegionScanner interface does not exposes filters info. We could try to cast it to particular class but it may be too unsafe.

          Alternatively we could try to modify Scan object (add filters), but this is obviously too bad approach as this object is reused across regions.

          May be we can hear some advice from Coprocessor feature devs?

          Show
          Alex Baranau added a comment - Yeah, same thought here: use fast-forwarding filter (the one which provides hints for the next row to forward to). I guess we can add such filter to a scan as a hook in open scan (on region level) time. I guess ideally we'd want to add such filter for RegionScanner in postScannerOpen(). Though RegionScanner interface does not exposes filters info. We could try to cast it to particular class but it may be too unsafe. Alternatively we could try to modify Scan object (add filters), but this is obviously too bad approach as this object is reused across regions. May be we can hear some advice from Coprocessor feature devs?
          Hide
          Anoop Sam John added a comment -

          Hi Lars,

          It might be possible to provide a custom filter to do that.

          • What we wanted from the filter is include a row and then seek to the next row which we are interested in. I cant see such a facility with our Filter right now. Correct me if I am wrong. So suppose we already seeked to one row and this need to be included in the result, then the Filter should return INCLUDE. Then when the next next() call happens, then only we can return a SEEK_USING_HINT. So one extra row reading is needed. This might create even one unwanted HFileBlock fetch (who knows).
            Can we add reseek() at higher level?
            If you have suggestion pls give me.

          Thanks
          Anoop

          Show
          Anoop Sam John added a comment - Hi Lars, It might be possible to provide a custom filter to do that. What we wanted from the filter is include a row and then seek to the next row which we are interested in. I cant see such a facility with our Filter right now. Correct me if I am wrong. So suppose we already seeked to one row and this need to be included in the result, then the Filter should return INCLUDE. Then when the next next() call happens, then only we can return a SEEK_USING_HINT. So one extra row reading is needed. This might create even one unwanted HFileBlock fetch (who knows). Can we add reseek() at higher level? If you have suggestion pls give me. Thanks Anoop
          Hide
          Lars Hofhansl added a comment -

          @Alex: Looks like preScannerOpen could actually change the passed Scan object and add a filter. The API is a bit strange. Scan is marked final, but it is perfectly OK (and possible, and final does not prevent that) to change it here. postScannerOpen also gets the Scan object, but modifying it there is pointless.

          @Anoop: Yep, for that we'd need to add INCLUDE_AND_SEEK_USING_HINT (similar to the INCLUDE_AND_SEEK_NEXT_ROW that we already have). Shouldn't be hard to add, I'm happy to do that, if that's the route we want to go with this.

          Show
          Lars Hofhansl added a comment - @Alex: Looks like preScannerOpen could actually change the passed Scan object and add a filter. The API is a bit strange. Scan is marked final, but it is perfectly OK (and possible, and final does not prevent that) to change it here. postScannerOpen also gets the Scan object, but modifying it there is pointless. @Anoop: Yep, for that we'd need to add INCLUDE_AND_SEEK_USING_HINT (similar to the INCLUDE_AND_SEEK_NEXT_ROW that we already have). Shouldn't be hard to add, I'm happy to do that, if that's the route we want to go with this.
          Hide
          Ted Yu added a comment -

          I logged HBASE-5512 for adding INCLUDE_AND_SEEK_USING_HINT

          Show
          Ted Yu added a comment - I logged HBASE-5512 for adding INCLUDE_AND_SEEK_USING_HINT
          Hide
          Anoop Sam John added a comment -

          @Lars

          Seeking is done one level (or two actually) level deeper.
          Seeking is done in the StoreScanners, coprocessors see RegionScanners.

          It is not entirely clear to me where to hook this up in that API.

          Yes at RegionScanners level we dont have seek() or reseek(). It is one level down @ KeyValueHeap level.
          Will it be correct to add seek() reseek() behaviours at RegionScanner level?[ We just need to delegate seek() or reseek() calls into KeyValueHeap object within the RegionScanner...]

          If so it would be very easy to do a reseek() to the needed row at the coprocessor preScannerNext().
          next() will take the needed row.

          What do you say? Correct me if my suggestion is wrong.

          Show
          Anoop Sam John added a comment - @Lars Seeking is done one level (or two actually) level deeper. Seeking is done in the StoreScanners, coprocessors see RegionScanners. It is not entirely clear to me where to hook this up in that API. Yes at RegionScanners level we dont have seek() or reseek(). It is one level down @ KeyValueHeap level. Will it be correct to add seek() reseek() behaviours at RegionScanner level?[ We just need to delegate seek() or reseek() calls into KeyValueHeap object within the RegionScanner...] If so it would be very easy to do a reseek() to the needed row at the coprocessor preScannerNext(). next() will take the needed row. What do you say? Correct me if my suggestion is wrong.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Also we need to have a provision to use the nbRows that is passed while scanning to be used in coprocessor such that the normal scanner.next() can be used in sync with the cached preScannerNext that we do with nbRows.

          Show
          ramkrishna.s.vasudevan added a comment - Also we need to have a provision to use the nbRows that is passed while scanning to be used in coprocessor such that the normal scanner.next() can be used in sync with the cached preScannerNext that we do with nbRows.
          Hide
          Lars Hofhansl added a comment -

          @Anoop: We can certainly try to expose this at the RegionScanner level. Although I feel it might actually be harder than you think, as the seeking is dealt with on a store basis, and we do not want to inhibit the ability to deal with Stores in parallel in the future.
          RegionScanner.seek would have to go through all Stores and for each Store seek the MemstoreScanner and all StoreFileScanners. Seeking this way across stores is only valid if we seek on row boundaries (each Store - i.e. column family - has it's own set of columns, which could even have the same names between stores).

          @HBASE-5521: I started working on that, but I am starting to question the usefulness.
          A filter is per KeyValue (at least the method that allows for seeking). So, many KeyValues flow through the Filter for a single row, and the filter needs to seek separately for each ColumnFamily (as explained above and on the mailing list).
          So the gain from this would be fairly minimal (which I guess is why we do not have this).
          For example a row with many column would need to issue many INCLUDE's and only for the last KeyVakue (and how would it know it's the last?) issue INCLUDE_AND_SEEK...

          Show
          Lars Hofhansl added a comment - @Anoop: We can certainly try to expose this at the RegionScanner level. Although I feel it might actually be harder than you think, as the seeking is dealt with on a store basis, and we do not want to inhibit the ability to deal with Stores in parallel in the future. RegionScanner.seek would have to go through all Stores and for each Store seek the MemstoreScanner and all StoreFileScanners. Seeking this way across stores is only valid if we seek on row boundaries (each Store - i.e. column family - has it's own set of columns, which could even have the same names between stores). @ HBASE-5521 : I started working on that, but I am starting to question the usefulness. A filter is per KeyValue (at least the method that allows for seeking). So, many KeyValues flow through the Filter for a single row, and the filter needs to seek separately for each ColumnFamily (as explained above and on the mailing list). So the gain from this would be fairly minimal (which I guess is why we do not have this). For example a row with many column would need to issue many INCLUDE's and only for the last KeyVakue (and how would it know it's the last?) issue INCLUDE_AND_SEEK...
          Hide
          Anoop Sam John added a comment -

          @HBASE-5521: I started working on that, but I am starting to question the usefulness.
          A filter is per KeyValue (at least the method that allows for seeking). So, many KeyValues flow through the Filter for a single row, and the filter needs to seek separately for each ColumnFamily (as explained above and on the mailing list).
          So the gain from this would be fairly minimal (which I guess is why we do not have this).
          For example a row with many column would need to issue many INCLUDE's and only for the last KeyVakue (and how would it know it's the last?) issue INCLUDE_AND_SEEK..

          Lars, I was also thinking on this yesterday after seeing the patch. I wanted to give a test case try run before commenting

          Regarding you 1st comment, In our above discussion scenario of seek() we need a row boundary seek.. Yes all the stores ( memstore and all store files in that store) need to get seeked to needed point. Let me see more on this on Monday. we had done small changes and tested this once. I mean we were able to seek to row boundaries.

          Thanks a lot Lars for your work and suggestion

          @Ram: Yes we can file a Jira for co processor support for next( int nbrows)?

          Show
          Anoop Sam John added a comment - @ HBASE-5521 : I started working on that, but I am starting to question the usefulness. A filter is per KeyValue (at least the method that allows for seeking). So, many KeyValues flow through the Filter for a single row, and the filter needs to seek separately for each ColumnFamily (as explained above and on the mailing list). So the gain from this would be fairly minimal (which I guess is why we do not have this). For example a row with many column would need to issue many INCLUDE's and only for the last KeyVakue (and how would it know it's the last?) issue INCLUDE_AND_SEEK.. Lars, I was also thinking on this yesterday after seeing the patch. I wanted to give a test case try run before commenting Regarding you 1st comment, In our above discussion scenario of seek() we need a row boundary seek.. Yes all the stores ( memstore and all store files in that store) need to get seeked to needed point. Let me see more on this on Monday. we had done small changes and tested this once. I mean we were able to seek to row boundaries. Thanks a lot Lars for your work and suggestion @Ram: Yes we can file a Jira for co processor support for next( int nbrows)?
          Hide
          Anoop Sam John added a comment -

          Created https://issues.apache.org/jira/browse/HBASE-5517 for the co processor change

          Show
          Anoop Sam John added a comment - Created https://issues.apache.org/jira/browse/HBASE-5517 for the co processor change
          Hide
          Anoop Sam John added a comment -

          @Lars
          I have created HBASE-5520 for the support for seek() and reseek() at the RegionScanner. As I mentioned in the comment we need row boundary seeks only. Yes it might be complex wrt the other kind of seeks. We can support only seek() and reseek() at the row boundary level only at the RegionScanner?
          We can take any of the below approaches
          1. The APIs make use of the rowkey and timestamp only from the KeyValue passed.
          2. Check at the RegionScannerImpl level that it is not having the CF, qualifier in the passed KV. If so throw exception. Only the KV can have the rowkey and timestamp also.[It is ok.Timestamp can be there...]
          3. Dont bother let the seek happen. But may be dangerous??

          Pls give ur valuable suggestions

          Me and Ram started working with this.

          From the co processor preNext() we can call reseek with KeyValue.createFirstOnRow(final byte [] row)

          Show
          Anoop Sam John added a comment - @Lars I have created HBASE-5520 for the support for seek() and reseek() at the RegionScanner. As I mentioned in the comment we need row boundary seeks only. Yes it might be complex wrt the other kind of seeks. We can support only seek() and reseek() at the row boundary level only at the RegionScanner? We can take any of the below approaches 1. The APIs make use of the rowkey and timestamp only from the KeyValue passed. 2. Check at the RegionScannerImpl level that it is not having the CF, qualifier in the passed KV. If so throw exception. Only the KV can have the rowkey and timestamp also. [It is ok.Timestamp can be there...] 3. Dont bother let the seek happen. But may be dangerous?? Pls give ur valuable suggestions Me and Ram started working with this. From the co processor preNext() we can call reseek with KeyValue.createFirstOnRow(final byte [] row)
          Hide
          Anoop Sam John added a comment -

          Or may be we can give the signature of the seek() and reseek() at the RegionScanner as seek( byte[] rowKey ) reseek( byte[] rowKey )?
          So that the seek will be always to the begin KV of the row in every CF. [ if CF contains that key ]

          Show
          Anoop Sam John added a comment - Or may be we can give the signature of the seek() and reseek() at the RegionScanner as seek( byte[] rowKey ) reseek( byte[] rowKey )? So that the seek will be always to the begin KV of the row in every CF. [ if CF contains that key ]
          Hide
          Andrew Purtell added a comment -

          Stale issue. Reopen if still relevant (if there's new activity)

          Show
          Andrew Purtell added a comment - Stale issue. Reopen if still relevant (if there's new activity)

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Purtell
            • Votes:
              3 Vote for this issue
              Watchers:
              39 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development