A followup to my previous comment about previous key locking:
When working on
DERBY-642, I came to the conclusion that obtaining the
locks in reverse order works fine as long as all locks can be obtained
without waiting. However, if the backward scan has to wait for a lock,
some extra logic will be needed.
To illustrate, imagine an index that contains the following key
values: 10, 20, 30, 40, 50
If the index is being scanned backwards, with serializable isolation
level, we may get into this situation:
The row with key=50 is locked immediately and we read its value. Then
we move on and attempt to lock the row with key=40, but have to wait
because some other transaction has locked it exclusively. When the
other transaction releases its exclusive lock, we get our lock on the
row and wake up. The problem now is that we don't know whether some
other transaction inserted a row with key value between 40 and 50
while we where waiting for the lock on key=40. If a row with value 45
has been inserted, we need to make sure that the scan also sees that
row, otherwise it might turn up as a phantom row later in the
transaction, which is not allowed when the isolation level is
In a forward scan, this wouldn't have been a problem. If we had locked
row 10 and waited for row 20, no one could have inserted a row in the
range between 10 and 20 because our lock on row 10 would block any
other transaction trying to insert a row immediately to the right of
that row in the B-tree. So once we got the lock on row 20, we would
know that there was no row 15.
I can think of these options:
1) Always take table locks when doing backward scans with serializable
isolation. Then no other transaction can change the table, and phantom
rows cannot appear.
2) When waking up after waiting for a lock, reposition to the row we
saw previously (instead of repositioning to the row we attempted to
lock, as we currently do). When restarting the scan from this last
known good position, we'd notice if rows had been inserted immediately
to the left of that position.
3) Make inserts lock both the previous key and the next key, so that
locks taken by a B-tree scan always protect the logically next key,
regardless of the direction of the scan.
1 and 2 have the advantage that they only affect backward scans with
serializable isolation, so that the cost is paid by the operations
that need the extra logic. Option 3 would affect all inserts into a
B-tree, regardless of isolation level. Option 2 would give better
concurrency than option 1. Probably some variant of option 2 is what
we should go for.