Issue Details (XML | Word | Printable)

Key: DERBY-2462
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Dag H. Wanvik
Reporter: Jeff Clary
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Derby

org.apache.derby.impl.store.access.BackingStoreHashTableFromScan does not honor ResultSet holdability

Created: 16/Mar/07 07:22 PM   Updated: 16/Nov/07 03:17 PM
Return to search
Component/s: Store
Affects Version/s: 10.1.1.0, 10.1.2.1, 10.1.3.1, 10.2.1.6, 10.2.2.0
Fix Version/s: 10.3.1.4

Time Tracking:
Not Specified

File Attachments:
  Size
File Licensed for inclusion in ASF works DERBY-2462-1.diff 2007-03-16 11:15 PM Dag H. Wanvik 6 kB
File Licensed for inclusion in ASF works DERBY-2462-1.stat 2007-03-16 11:15 PM Dag H. Wanvik 0.5 kB
File Licensed for inclusion in ASF works DERBY-2462-2.diff 2007-03-22 02:21 AM Dag H. Wanvik 19 kB
File Licensed for inclusion in ASF works DERBY-2462-2.stat 2007-03-22 02:21 AM Dag H. Wanvik 0.8 kB
File Licensed for inclusion in ASF works DERBY-2462-3.diff 2007-04-23 09:30 PM Dag H. Wanvik 21 kB
File Licensed for inclusion in ASF works DERBY-2462-3.stat 2007-04-23 09:30 PM Dag H. Wanvik 1 kB
File Licensed for inclusion in ASF works DERBY-2462-4.diff 2007-04-26 06:05 PM Dag H. Wanvik 26 kB
File Licensed for inclusion in ASF works DERBY-2462-4.stat 2007-04-26 06:05 PM Dag H. Wanvik 1 kB
Java Source File DerbyHoldabilityTest.java 2007-03-16 07:23 PM Jeff Clary 8 kB
Environment: Test under Windows Vista, Java 1.4.2_13

Resolution Date: 16/May/07 01:11 PM


 Description  « Hide
After an unrelated statement on the same connection commits, and after some number of successful calls to ResultSet.next(), a subsequent call to ResultSet.next() throws an SQLException with a message like: The heap container with container id Container(-1, 1173965368428) is closed. This seems to be related to the hard-coded passing of false to the super in the constructor of org.apache.derby.impl.store.access.BackingStoreHashTableFromScan.

Steps to reproduce:

1. Execute a statement on a connection that returns a result set.

2. Execute a second statement on the same connection that modifies the database and commits.

3. Call next() on the first result set until the exception is thrown.

Note that the number of rows that can be successfully retrieved from the result set seems to be related to the amount of data per row. Increasing the number of columns in the result set or the length of the columns causes the exception to be taken sooner.

The attached test program demonstrates the issue.


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Dag H. Wanvik added a comment - 16/Mar/07 10:53 PM
The spilling of the hash table to disk was added in DERBY-106 in svn
157861. I checked the issue comments, the code changes and the tests,
but I did not find a clue to why false is passed (for keepAfterCommit) to super from
BackingStoreHashTableFromScan.

I see the test for insensitive result sets (which also relies on
spilling the hash table to disk) uses holdability=true, but the test
for join and distinct only tests the non-holdable case.

So it would seem holdability is not meant to work (or was postponed?)
for the join and distinct cases. Does anyone know the status of this? (The
contributor is not active in the community any longer).

Dag H. Wanvik added a comment - 16/Mar/07 11:15 PM
Thanks for making the JIRA issue, Jeff!
I upload a preliminary patch (DERBY-2462-1.*) which works for your repro
program and which passes the regression tests (derbyall and suites.All)
on Solaris 10/x86, SUN JDK1.6.
It apparently fixes the hashed join case of spilling with holdability true;
but I have not added extra tests yet.

The patch should probably not be committed until someone more familiar with
this part of the code has looked at this.


Dag H. Wanvik added a comment - 20/Mar/07 03:51 AM
Dag said:

> I see the test for insensitive result sets (which also relies on
> spilling the hash table to disk) uses holdability=true, but the test
> for join and DISTINCT only tests the non-holdable case.

Checking the test SpillHash.java (from DERBY-106) further, I see that
none of the cases actually do run with holdability: Apparently the
case for insensitive result sets does, but I got misled by the fact
that the test relies on the metadata method
supportsOpenCursorsAcrossCommit() which always returns false for
Derby, as it turns out, since we do not support holdability for XA. So
the test chooses to run without holdability.

When enabling holdability in the test cases in SpillHash.java, I see
the following:
 
 - hash join fails with the same error as reported in this issue,
 - scroll insensitive result sets work (to be expected, has other
   tests).
 - DISTINCT fails with an exception NoSuchElementException in
   DiskHashTable#nextElement.

My first patch fixes the hash join case, but not the DISTINCT case,
because this accesses the spilled rows in the hash tabledifferently;
it uses a heap scan which fails (the hash join case accesses the
spiiled rows via a btree lookup of the hash key) since the scan is
closed due to the commit, and not reopened. I am working on a patch
which fixes this as well.

Dag H. Wanvik added a comment - 22/Mar/07 02:21 AM
This patch, DERBY-2462-2.{stat,diff} supercedes DERBY-2462-1. It does
the following:

   - modifies the test lang/SpillHash.java to run the test cases both
     with and without holdability set for the three tests of the disk
     spill of the backing store hash table:

        * hash join result set
        * distinct result set
        * scollable insensitive result set

     The changed test will reveal the same problem as the repro
     provided for this issue (if applied before the rest of this
     patch), as well as a failure of the distinct result set test case.

   - modifies BackingStoreHashTableFromScan to use the holdable
     mode of the disk hash table if applicable to the query.
     This change was encessary to make the hash join test case
     of SpillHash.java work with holdability.
   
   - modifies DiskHashtable.java to enable its enumerator class
     ElementEnum to survives a commit. It uses a scan of the row
     conglomerate (heap). Under holdability, the scan gets closed at
     commit time, and ElementEnum#nextElement would fail. The patch
     reopens the scan if appropriate (under holdability). In addition
     to the change in BackingStoreHashTableFromScan, this change was
     necessary to make the distinct test case of SpillHash.java work
     with holdability.

     I would especially appreciate if someone familiar with this area
     of the code could take a look at this part of the patch; the
     method I use for salvaging the scan is to save the row location
     after each nextElement, and use that to reopen the scan at the
     right position after a commit. I did not find a way to check
     if the scan was closed before attempting a fetch; I just catch
     the exception and check for SQLState.AM_SCAN_NOT_POSITIONED, in
     which I reopen the scan (only under holdability). I seems a bit of a
     hack; there may be a better way.

The patch fixes the repro, I ran derbyall and suites.All with no
errors on JDK1.6 under Solaris 10/x86. It is ready for review.

A B added a comment - 20/Apr/07 10:05 PM
I am not familiar with this area of code so I cannot make any definitive comments on whether or not the changes are correct nor on if there may be a better way. I did look at the patch, though, and had the following two questions. These are both minor and somewhat theoretical, so I don't think either should block commit of the patch. Just my proverbial two cents...

1) The interaction of the various ScanController interfaces and implementations has me completely confused (which has nothing to do with your changes) but I did notice one interface called ScanManager, which extends ScanController, that defines a "closeForEndTransaction(boolean)" method:

  http://db.apache.org/derby/javadoc/engine/org/apache/derby/iapi/store/access/conglomerate/ScanManager.html

I also noticed that most of the Scan implementations apparently implement the ScanManager class. At least, that's what I gather from the above javadoc page; I tried to verify that by looking at the actual code but got lost in no time. In any event, I was wondering if that method could somehow be used to indicate that a ScanController has been closed as the result of a commit. Then instead of catching an exception in DiskHashtable, you could try something like:

     try
     {
        // DERBY-2462: if holdable and scan got closed due
        // to commit, we need to reopen where we left off
        if (scan.closedByCommit() && keepAfterCommit)
        {
            scan = openRowConglomerate(tc, rowConglomerateId);
            scan.positionAtRowLocation(rowloc);
        }
        scan.fetch(row);
        
where "scan.closedByCommit()" would return true if the scan was closed by a ScanManager.closeForEndTransaction() call.

As I said, that's just a theoretical. I don't know if it's actually possible to get something like that working, but since I noticed the method I thought I'd ask.

If that type of check is not possible then it seems like we should at least be able to add an "isClosed()" or "isPositioned()" method to the ScanController that could be used instead of catching the exception. Is there something in particular that would make such an aproach infeasible?

2) The patch includes the following diff in DiskHashtable:

 + // DERBY-2462: if holdable and scan got closed due
 + // to commit, we need to reopen where we left off
 + if (keepAfterCommit && (SQLState.AM_SCAN_NOT_POSITIONED.
 + substring(0,5).
 + equals(e.getSQLState()))) {
 + scan = openRowConglomerate(tc, rowConglomerateId);
 + scan.positionAtRowLocation(rowloc);

It looks like we first make a call to reopen the scan, then we call "positionAtRowLocation()". But the javadoc for the latter method (see ScanController.java) seems to indicate that the reopen happens automatically:

    /**
     * Positions the scan at row location and locks the row.
     * If the scan is not opened, it will be reopened if this is a holdable
     * scan and there has not been any operations which causes RowLocations
     * to be invalidated.

Is this javadoc correct? If so, then is it necessary to explicitly call "openRowConglomerate()" in the above diff, or could we get by without it?

Also, the javadoc mentions that "positionAtRowLocation()" locks the row. The javadoc for ScanController.fetch() doesn't mention anything about locking the row, but I'm assuming it happens anyway...is that correct? (excuse my ignorance here). Can you confirm that row locking occurs as expected with these changes? I have no reason to believe otherwise, just thought I'd ask.

It's worth repeating here that I am not familiar with this area of code and thus this feedback could be of little-to-no-value, so please keep that in mind. If any committer out there (including you) wishes to commit these changes as they are, I would not complain.

On a more concrete level, I verified that the new lang/SpillHash.java test runs cleanly with your changes and fails with error XSCH6 without them, as expected. So thanks for fixing that test up.

Dag H. Wanvik added a comment - 23/Apr/07 09:30 PM
Thanks for the review, Army!

I attach a new version of the patch, DERBY-2462-3.* which supercedes
earlier versions. I address your comments explicitly below. Relative
to the previous version (*-2.diff), this patch:

1) opens the heap scan in DiskHashtable#ElementEnum's constructor with
   hold=true if we have holdability. Required for item 3) below to
   work.

2) avoids having to catch the not positioned exception in
   DiskHashtable#nextElement, when the scan has been closed, by using
   the new method ScanController#isPositioned.

3) removes the call to openRowConglomerate, now relying on
   positionAtRowLocation to do the reopening when required under
   holdability.

4) checks the result of positionAtRowLocation to verify that it
   worked, else throw an exception "24000" NO_CURRENT_ROW. It can fail
   if the row location has been invalidated by compress. Pretty
   unlikely, but one never knows..

5) removes the exit() calls from the SpillHash.java test to make it
   runnable with -Duseprocess=false

6) some small cleanup in DiskHashtable.

Regression tests ran cleanly on Sun 1.4 Solaris 10/x86.


Answers to Army's review comments:

> 1) The interaction of the various ScanController interfaces and
> implementations has me completely confused (which has nothing to do

you are not alone... ;)

> If that type of check is not possible then it seems like we should
> at least be able to add an "isClosed()" or "isPositioned()" method
> to the ScanController that could be used instead of catching the
> exception. Is there something in particular that would make such an
> aproach infeasible?

Yes, I thought of something like this too. I finally added a method
ScanController#isPositioned, and implemented it for heap and btree
scans (only used by the former, though).

> It looks like we first make a call to reopen the scan, then we call
> "positionAtRowLocation()". But the javadoc for the latter method
> (see ScanController.java) seems to indicate that the reopen happens
> automatically:
>
> /**
> * Positions the scan at row location and locks the row.
> * If the scan is not opened, it will be reopened if this is a holdable
> * scan and there has not been any operations which causes RowLocations
> * to be invalidated.
>
> Is this javadoc correct? If so, then is it necessary to explicitly
> call "openRowConglomerate()" in the above diff, or could we get by
> without it?

Yes, no and yes. Much cleaner, thanks. See item 1) and 3) above.

> Also, the javadoc mentions that "positionAtRowLocation()" locks the
> row. The javadoc for ScanController.fetch() doesn't mention anything
> about locking the row, but I'm assuming it happens anyway...is that
> correct? (excuse my ignorance here). Can you confirm that row
> locking occurs as expected with these changes? I have no reason to
> believe otherwise, just thought I'd ask.

Yes, locking is performed as a consequence of calling
positionAtRowLocation, see HeapScan#positionAtRowLocation which calls
reopenScanByRecordHandleAndSetLocks.

> It's worth repeating here that I am not familiar with this area of
> code and thus this feedback could be of little-to-no-value, so
> please keep that in mind.

I very much appreciate your feedback; I am also unfamiliar with this
area of the code, and an extra set of eyes is always useful! The patch
is cleaner now, I think. Thanks again! :)

A B added a comment - 23/Apr/07 11:54 PM
Thank you for considering my feedback, Dag. The v3 patch is indeed cleaner now.

Two follow-up questions that occur to me:

1) The new "isPositioned()" method checks to see if the current scan_state is "SCAN_INPROGRESS". This works well enough for the case we're trying to address--i.e. the scan was closed due to a commit and is therefore no longer "in progress".

  My question is: are there any scenarios in which we could get to the new code when the scan was closed for some reason *other* than a commit? If this is possible, will attempting to re-open the scan in such a scenario cause problems?

  I don't know enough about the lifetime of a scan to give any examples of when or if this could happen, but given the generality of the isPositioned() method I thought I'd bring it up.

  I did notice that one of the possible scan states is (pulled from BTreeScan):

  * SCAN_HOLD_INPROGRESS -
  * The scan has been opened and held open across a commit,
  * at the last commit the state was in SCAN_INPROGRESS.
  * The transaction which opened the scan has committed,
  * but the scan was opened with the "hold" option true.
  * At commit the locks were released and the "current"
  * position is remembered. In this state only two calls
  * are valid, either next() or close(). When next() is
  * called the scan is reopened, the underlying container
  * is opened thus associating all new locks with the current
  * transaction, and the scan continues at the "next" row.

  To see what would happen I changed "isPositioned()" method to:

     public boolean isPositioned() throws StandardException
     {
         return scan_state == SCAN_HOLD_INPROGRESS;
     }

  and then dropped the negation from the check in DiskHashtable:

     if (keepAfterCommit && scan.isPositioned()) {
         // automatically reopens scan:
         if (!scan.positionAtRowLocation(rowloc)) {
             // Will not happen unless compress of this table
             // has invalidated the row location. Possible?
             throw StandardException.
                 newException(SQLState.NO_CURRENT_ROW);
         }
     }

  I then ran lang/SpillHash.java and the test still passed. The good thing about this approach is that we will only execute the "reopen" logic if we know for a fact that the "scan has been opened and held open across a commit"--which is exactly what we want. Also, if we rename "isPositioned()" to something more appropriate, the "if" statement in DiskHashtable becomes more intuitive:

     if (scan.heldAcrossCommit()) {
         // automatically reopens scan:
         if (!scan.positionAtRowLocation(rowloc)) {

  Note that you wouldn't need to include "keepAfterCommit" anymore because the scan state can only be "HOLD_INPROGRESS" if the scan was opened with "hold" set to true, which (I think?) can only happen if "keepAfterCommit" is true.

2) I noticed that the lang/SpillHash.java test checks to make sure that the scan is still available after a commit if holdOverCommit is true. This is good because that's the whole point of DERBY-2462. I was wondering, though, if it would be worth it to check that the correct *error* is thrown if a call to "next()" is made after a commit when holdOverCommit is *false*. Or is that already tested somewhere else?

Thanks for your patience with my ramblings...

Dag H. Wanvik added a comment - 24/Apr/07 11:14 PM
Thanks for your continued help with this one, Army!

> 1) ...
> My question is: are there any scenarios in which we could get to
> the new code when the scan was closed for some reason *other* than a
> commit? If this is possible, will attempting to re-open the scan in
> such a scenario cause problems?

I don't believe so, but let me try an analysis:

The states of the scan (heap, I omit btree here for simplicity
although i believe it is similar) is one of

    {SCAN_INIT,
     SCAN_INPROGRESS,
     SCAN_DONE,
     SCAN_HOLD_INIT,
     SCAN_HOLD_INPROGRESS}

 Btw, the comment at the top of GenericScanController (used for heap
 scan along with subclass HeapScan) omits the *_HOLD_* states, so it
 is out of date..

* The constructor of DiskHashTable#ElementEnum opens the scan. This
  moves the scan state to SCAN_INIT if successful. If not, it is
  silently swallowed (not good!), and the enumeration will have zero
  elements (hasMore==false).

* Next, the constructor performs a scan.next() which sets the boolean
  state variable "hasMore". When this is true, the scan state will be
  SCAN_INPROGRESS, when its is false, there are no more data and the
  scan state will be SCAN_DONE. So when the constructor of ElementEnum
  returns, the scan state (if no error occurred) will be one of
  {SCAN_INPROGRESS, SCAN_DONE}.

  If we have holdability: The state SCAN_HOLD_INIT can only happen if
  a commit happens *before* an initial next() is performed, so that
  state can not happen for this scan. Also, I believe there are no
  state transitions possible back to SCAN_INIT or SCAN_HOLD_INIT once
  a next() is performed and {SCAN_INPROGRESS, SCAN_DONE} is reached (I
  did some inspection).

* Now, when ElementEnum#nextElement() is attempted, there are four cases
  cases:
     n1) holdability and no commit has happened
     n2) holdability and a commit happened
     n3) no holdability and no commit has happened
     n4) no holdability and a commit happened
  
  n1) Scan state should be either
      SCAN_INPROGRESS (and hasMore == true) or
      SCAN_DONE (and hasMore == false).

      If hasMore == false, NoSuchElementException is thrown.
      If hasMore == true, state should be SCAN_INPROGRESS,
      isPositioned() returns true and the fetch will succeed. Also a
      new next() is performed which moves the state to one of
      {SCAN_INPROGRESS, SCAN_DONE}.

  n2) Scan state should be either
      SCAN_HOLD_INPROGRESS¹ (and hasMore == true) or
      SCAN_DONE (and hasMore == false).

      If hasMore == false, NoSuchElementException is thrown. This
      would not normally happen, since nextElement would not be
      called, cf, hasMoreElements().

      If hasMore == true, state should be SCAN_HOLD_INPROGRESS: Our
      predicate "(keepAfterCommit && !scan.isPositioned()" == true and
      we reopen scan before doing the fetch, which should succeed. A
      new next() is performed which moves the state to one of
      {SCAN_INPROGRESS, SCAN_DONE}.

      ¹see GenericScanController#closeForEndTransaction().

  n3) Similar to n1, except isPositioned() is not called.

  n4) Scan state should be SCAN_DONE,
      cf. GenericScanController#closeForEndTransaction.
  
      In this case, nextElement is never called; the fact that the
      result set is closed is caught at a higher level, e.g. in
      BasicNoPutResultSetImpl#getNextRow() for the DISTINCT case
      (SQLState.LANG_RESULT_SET_NOT_OPEN, ca line 463).

      Even if it were reached, no attempt to reopen it would be
      performed (not held), and scan.fetch would throw
      SQLState.AM_SCAN_NOT_POSITIONED, which is OK.

* There is a public method setScanState(state) in
  GenericScanController which could conceivably be used to effect
  other state transitions, but I checked, and it is only used by
  HeapScan#positionAtRowLocation to transition from
  SCAN_HOLD_INPROGRESS back to SCAN_INPROGRESS (and for a similar
  purpose by HeapCompressScan).

* Now, for your question, would it better (safer) to explicitly test
  for SCAN_HOLD_INPROGRESS with a method called, say,
  heldAcrossCommit? I believe my analysis above shows that when the
  call to isPositioned is performed, there are only two possible
  states the scan can be in: SCAN_HOLD_INPROGRESS or SCAN_INPROGRESS,
  so they are equally safe (as the code stands right now, anyway).

  Now, whether there is, or were to be, some *other* way of closing
  the scan (moving it to SCAN_DONE without also closing the result
  set), I don't know, but if it were, positionAtRowLocation will
  actually try to reopen the scan (not sure if this is good or not ;),
  however scan.fetch() would fail with
  SQLState.AM_SCAN_NOT_POSITIONED. If we use your approach, we would
  not attempt to reopen in such a case, which may be safer.

  I agree heldAcrossCommit has the benefit of avoiding the test for
  holdability as you indicate, which does make for easier reading of
  the logic. I think I went for is isPositioned from a vague feeling
  this was potentially more generally useful..

  Maybe a (even more) generally useful method could be:
  
    public boolean isHeldAfterCommit() throws StandardException
    {
        return (scan_state == SCAN_HOLD_INIT ||
                scan_state == SCAN_HOLD_INPROGRESS);
    }

  that is, if this returns true, the scan can always be reopened
  (although in the case at hand, only the second state tested for may
  occur as shown above).

  What do you think?


> 2) .... I was wondering, though, if it would be worth it to check
> that the correct *error* is thrown if a call to "next()" is made
> after a commit when holdOverCommit is *false*. Or is that already
> tested somewhere else?

As I mentioned above, the error is caught higher up in language layer
(open result set or not), so I belive this is (or should be ;-) tested
for elsewhere. I manually modified the test to verify that for all
three variants (join, distinct and cursor), this happened. If you this
it is still advisable, I can add those tests cases to SpillHash.

If you agree, I will make a version of the patch with the new
isHeldAfterCommit outlined above.

A B added a comment - 25/Apr/07 04:12 PM
Thank you _very_ much for the detailed analysis, Dag. I feel better about the "safety" of your v3 changes after reading your previous comment.

> Maybe a (even more) generally useful method could be:
>
> public boolean isHeldAfterCommit() throws StandardException
> {
> return (scan_state == SCAN_HOLD_INIT ||
> scan_state == SCAN_HOLD_INPROGRESS);
> }
>
> that is, if this returns true, the scan can always be reopened
> (although in the case at hand, only the second state tested for may
> occur as shown above).
>
> What do you think?

+1, I like this idea even better. Thank you for the suggestion and for your willingness to implement it.

> I manually modified the test to verify that for all three variants (join, distinct and cursor),
> this happened. If you this it is still advisable, I can add those tests cases to SpillHash.

If it's not too much work I think this would be good. I'm sure there are tests elsewhere to check the general concept of cursor holdability across commits, but it might be nice to have a test case for the specific scenario of a spilled DiskHashtable.

> If you agree, I will make a version of the patch with the new isHeldAfterCommit outlined above.

Sounds great. Thanks again for your continued work with this, and for your prompt consideration of my feedback.

Dag H. Wanvik added a comment - 26/Apr/07 06:05 PM
This patch (DERBY-2462-4.*) supercedes earlier versions.

As agreed,

- it replaces ScanController#isPositioned with with isHeldAfterCommit
  and uses that in DiskHashTable#ElementNum#NextElement.
- it adds negative tests to SpillHash.java for the non-holdable cases

Ran regression tests successfully again (Sun JDK 1.4 and 1.6), Solaris
10/x86 modulo upgrade tests which failed as mentioned on derby-dev
(unrelated I believe).

A B added a comment - 26/Apr/07 06:53 PM
Latest patch (version 4) incorporates all previous feedback and I have no other comments to make. I applied the patch and ran lang/SpillHash.java with no problems. I also ran the repro attached to this issue after applying the patch, and it ran cleanly.

+1 to commit.

Thank you for your persistence with this issue, Dag.

Dag H. Wanvik added a comment - 16/May/07 01:09 PM
Refreshed the patch and ran derbyall and suites.All ok again, modulo
one seemingly intermittent error in ReleaseCompileLocksTest. I don't
this is related to this patch. Committed as svn 538572.