Derby
  1. Derby
  2. DERBY-884

allow and use backward scans on indexes.

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.1.3.1
    • Fix Version/s: None
    • Component/s: SQL, Store
    • Urgency:
      Low

      Description

      Improve the access interface to support backward scans, currently only forward scans are supported.

      Improve the btree implementation to support backward scans. The structure could support this, the work just has not been done. With
      row level locking, concurrent tree splitting, and current assumptions
      throughout the access method that scans go top/down, left to right the
      work to do a backward scan is harder than doing a forward scan. Also
      once the store portion is done, there would be necessary changes to:
      optimizer, execution engine, and scan interface to allow the backward
      scan. This would be a hard first project.

      Improve the optimizer to understand derby indexes now support backward scans.

      Improve the execution engine to use the new backward scan interfaces to execute backward scans.

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          Does anyone have any thoughts on how to avoid deadlocks between the
          scans that move left to right and those that move right to left?
          Here's what comes to my mind:

          The scan that moves right to left could save the scan position (that
          is, copy the key found on the current position, similar to what we did
          in DERBY-2991) if it cannot obtain the latch on the left sibling
          immediately. After saving the position, it could release the latch on
          the current leaf page and wait for the left sibling to become
          unlatched. Once it detects that the left sibling is unlatched, it can
          reposition in the B-tree using the saved key and continue the scan
          from the old position.

          Since the latch on the current leaf is released before the scan waits
          for the latch on the left sibling, the right-to-left scan shouldn't be
          causing any deadlocks. And releasing the latch on the current leaf
          page should also be safe, since we reposition using the key so we'll
          find the correct position in the scan even if the key was moved while
          the page was not protected by a latch.

          Show
          Knut Anders Hatlen added a comment - Does anyone have any thoughts on how to avoid deadlocks between the scans that move left to right and those that move right to left? Here's what comes to my mind: The scan that moves right to left could save the scan position (that is, copy the key found on the current position, similar to what we did in DERBY-2991 ) if it cannot obtain the latch on the left sibling immediately. After saving the position, it could release the latch on the current leaf page and wait for the left sibling to become unlatched. Once it detects that the left sibling is unlatched, it can reposition in the B-tree using the saved key and continue the scan from the old position. Since the latch on the current leaf is released before the scan waits for the latch on the left sibling, the right-to-left scan shouldn't be causing any deadlocks. And releasing the latch on the current leaf page should also be safe, since we reposition using the key so we'll find the correct position in the scan even if the key was moved while the page was not protected by a latch.
          Hide
          Mike Matrigali added a comment -

          To avoid latch/latch deadlocking you should maintain the current latching ordering scheme. So only wait on latches while holding latches when moving top to down and left to right. As you have laid out when moving right to left and you always request the latch
          NOWAIT and then if you have to wait you need to save your position somehow and give up the current latch and wait for the held
          latch. Once you have the new latch you need to give it up and reposition on the old row, get that latch and continue the left to
          right scan. Positioning by key should always work, but I think you need to handle the case where the key disappears (at least for
          some isolation levels). I think there is existing code that does this to handle the similar case where we can't hold a latch while
          waiting for a lock on a row, but the code probably is not exactly right for right to left vs left to right.

          In right to left you probably want to position on >= ; where left to right we probably position on <=.

          I think the cleanest implementation would be to add new backward scan interfaces to store and implement the scan as separate
          classes, following the BTreeMaxScan.java pattern. (actually once a real backward scan exists BTreeMaxScan should be changed
          to use it and will be much more efficient). I would suggest looking at the TransactionController.openScan() interface defiition and
          figure out what the start and stop parameters mean for a reverse scan. I believe there should be no interface change to the qualifiers
          (but coding for them might be different (ie. opposite) for the backward scan - not sure). The backward vs. forward stuff seems obvious for single key indexes but gets complicated for multiple column key indexes.

          Show
          Mike Matrigali added a comment - To avoid latch/latch deadlocking you should maintain the current latching ordering scheme. So only wait on latches while holding latches when moving top to down and left to right. As you have laid out when moving right to left and you always request the latch NOWAIT and then if you have to wait you need to save your position somehow and give up the current latch and wait for the held latch. Once you have the new latch you need to give it up and reposition on the old row, get that latch and continue the left to right scan. Positioning by key should always work, but I think you need to handle the case where the key disappears (at least for some isolation levels). I think there is existing code that does this to handle the similar case where we can't hold a latch while waiting for a lock on a row, but the code probably is not exactly right for right to left vs left to right. In right to left you probably want to position on >= ; where left to right we probably position on <=. I think the cleanest implementation would be to add new backward scan interfaces to store and implement the scan as separate classes, following the BTreeMaxScan.java pattern. (actually once a real backward scan exists BTreeMaxScan should be changed to use it and will be much more efficient). I would suggest looking at the TransactionController.openScan() interface defiition and figure out what the start and stop parameters mean for a reverse scan. I believe there should be no interface change to the qualifiers (but coding for them might be different (ie. opposite) for the backward scan - not sure). The backward vs. forward stuff seems obvious for single key indexes but gets complicated for multiple column key indexes.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks, Mike!

          I'm cautiously optimistic that we can use BTreeScan.savePositionAndReleasePage() and BTreeScan.reposition() more or less as they are. But as you mentioned, the search parameter passed to the scan in reposition() probably needs to be changed to POSITION_RIGHT_OF_PARTIAL_KEY_MATCH for backward scans. Those methods also have a nice optimization that you only do a new traversal from the root of the BTree if a row was actually moved off of the page while the latch wasn't held.

          As to the complexity of multi-column keys, did you have something specific in mind? I was hoping that its complexity would be limited to the repositioning logic and that we could reuse existing logic for that part.

          Previous key locking is maybe also a bit tricky to get right when we're scanning backwards. We'd probably have to obtain the previous key lock at the end of scan instead of at the beginning. I'm not sure if that would cause any problems.

          Show
          Knut Anders Hatlen added a comment - Thanks, Mike! I'm cautiously optimistic that we can use BTreeScan.savePositionAndReleasePage() and BTreeScan.reposition() more or less as they are. But as you mentioned, the search parameter passed to the scan in reposition() probably needs to be changed to POSITION_RIGHT_OF_PARTIAL_KEY_MATCH for backward scans. Those methods also have a nice optimization that you only do a new traversal from the root of the BTree if a row was actually moved off of the page while the latch wasn't held. As to the complexity of multi-column keys, did you have something specific in mind? I was hoping that its complexity would be limited to the repositioning logic and that we could reuse existing logic for that part. Previous key locking is maybe also a bit tricky to get right when we're scanning backwards. We'd probably have to obtain the previous key lock at the end of scan instead of at the beginning. I'm not sure if that would cause any problems.
          Hide
          Knut Anders Hatlen added a comment -

          A followup to my previous comment about previous key locking:

          When working on DERBY-642, I came to the conclusion that obtaining the
          locks in reverse order works fine as long as all locks can be obtained
          without waiting. However, if the backward scan has to wait for a lock,
          some extra logic will be needed.

          To illustrate, imagine an index that contains the following key
          values: 10, 20, 30, 40, 50

          If the index is being scanned backwards, with serializable isolation
          level, we may get into this situation:

          The row with key=50 is locked immediately and we read its value. Then
          we move on and attempt to lock the row with key=40, but have to wait
          because some other transaction has locked it exclusively. When the
          other transaction releases its exclusive lock, we get our lock on the
          row and wake up. The problem now is that we don't know whether some
          other transaction inserted a row with key value between 40 and 50
          while we where waiting for the lock on key=40. If a row with value 45
          has been inserted, we need to make sure that the scan also sees that
          row, otherwise it might turn up as a phantom row later in the
          transaction, which is not allowed when the isolation level is
          serializable.

          In a forward scan, this wouldn't have been a problem. If we had locked
          row 10 and waited for row 20, no one could have inserted a row in the
          range between 10 and 20 because our lock on row 10 would block any
          other transaction trying to insert a row immediately to the right of
          that row in the B-tree. So once we got the lock on row 20, we would
          know that there was no row 15.

          I can think of these options:

          1) Always take table locks when doing backward scans with serializable
          isolation. Then no other transaction can change the table, and phantom
          rows cannot appear.

          2) When waking up after waiting for a lock, reposition to the row we
          saw previously (instead of repositioning to the row we attempted to
          lock, as we currently do). When restarting the scan from this last
          known good position, we'd notice if rows had been inserted immediately
          to the left of that position.

          3) Make inserts lock both the previous key and the next key, so that
          locks taken by a B-tree scan always protect the logically next key,
          regardless of the direction of the scan.

          1 and 2 have the advantage that they only affect backward scans with
          serializable isolation, so that the cost is paid by the operations
          that need the extra logic. Option 3 would affect all inserts into a
          B-tree, regardless of isolation level. Option 2 would give better
          concurrency than option 1. Probably some variant of option 2 is what
          we should go for.

          Show
          Knut Anders Hatlen added a comment - A followup to my previous comment about previous key locking: When working on DERBY-642 , I came to the conclusion that obtaining the locks in reverse order works fine as long as all locks can be obtained without waiting. However, if the backward scan has to wait for a lock, some extra logic will be needed. To illustrate, imagine an index that contains the following key values: 10, 20, 30, 40, 50 If the index is being scanned backwards, with serializable isolation level, we may get into this situation: The row with key=50 is locked immediately and we read its value. Then we move on and attempt to lock the row with key=40, but have to wait because some other transaction has locked it exclusively. When the other transaction releases its exclusive lock, we get our lock on the row and wake up. The problem now is that we don't know whether some other transaction inserted a row with key value between 40 and 50 while we where waiting for the lock on key=40. If a row with value 45 has been inserted, we need to make sure that the scan also sees that row, otherwise it might turn up as a phantom row later in the transaction, which is not allowed when the isolation level is serializable. In a forward scan, this wouldn't have been a problem. If we had locked row 10 and waited for row 20, no one could have inserted a row in the range between 10 and 20 because our lock on row 10 would block any other transaction trying to insert a row immediately to the right of that row in the B-tree. So once we got the lock on row 20, we would know that there was no row 15. I can think of these options: 1) Always take table locks when doing backward scans with serializable isolation. Then no other transaction can change the table, and phantom rows cannot appear. 2) When waking up after waiting for a lock, reposition to the row we saw previously (instead of repositioning to the row we attempted to lock, as we currently do). When restarting the scan from this last known good position, we'd notice if rows had been inserted immediately to the left of that position. 3) Make inserts lock both the previous key and the next key, so that locks taken by a B-tree scan always protect the logically next key, regardless of the direction of the scan. 1 and 2 have the advantage that they only affect backward scans with serializable isolation, so that the cost is paid by the operations that need the extra logic. Option 3 would affect all inserts into a B-tree, regardless of isolation level. Option 2 would give better concurrency than option 1. Probably some variant of option 2 is what we should go for.
          Hide
          Dag H. Wanvik added a comment -

          2) sounds like a good solution to me. I'd prefer that over 1) if it doesn't get too complex, I think. I agree it might be too
          costly to impose locking the previous row for inserts in the more common non-serializable modes..better pay the complexity in the backwards scan.

          Show
          Dag H. Wanvik added a comment - 2) sounds like a good solution to me. I'd prefer that over 1) if it doesn't get too complex, I think. I agree it might be too costly to impose locking the previous row for inserts in the more common non-serializable modes..better pay the complexity in the backwards scan.

            People

            • Assignee:
              Unassigned
              Reporter:
              Mike Matrigali
            • Votes:
              5 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development