Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.2.1, 10.1.3.1, 10.2.2.0, 10.3.3.0, 10.4.2.0, 10.5.1.1, 10.8.1.2
    • Fix Version/s: 10.8.3.0, 10.9.2.2, 10.10.1.1
    • Component/s: SQL
    • Environment:
      Windows Vista, OS X 10.5+
    • Urgency:
      Normal
    • Issue & fix info:
      Repro attached
    • Bug behavior facts:
      Seen in production

      Description

      Due to a design flaw in the statement cache, a deadlock can occur if a prepared statement becomes out-of-date.

      I will illustrate this with the following example:

      The application is using the embedded Derby driver. The application has two threads, and each thread uses its own connection.

      There is a table named MYTABLE with column MYCOLUMN.

      1. A thread prepares and executes the query SELECT MYCOLUMN FROM MYTABLE. The prepared statement is stored in the statement cache (see org.apache.derby.impl.sql.GenericStatement for this logic)
      2. After some time, the prepared statement becomes invalid or out-of-date for some reason (see org.apache.derby.impl.sql.GenericPreparedStatement)
      3. Thread 1 begins a transaction and executes LOCK TABLE MYTABLE IN EXCLUSIVE MODE
      4. Thread 2 begins a transaction and executes SELECT MYCOLUMN FROM MYTABLE. The statement is in the statement cache but it is out-of-date. The thread begins to recompile the statement. To compile the statement, the thread needs a shared lock on MYTABLE. Thread 1 already has an exclusive lock on MYTABLE. Thread 2 waits.
      5. Thread 1 executes SELECT MYCOLUMN FROM MYTABLE. The statement is in the statement cache but it is being compiled. Thread 1 waits on the statement's monitor.
      6. We have a deadlock. Derby eventually detects a lock timeout, but the error message is not descriptive. The stacks at the time of the deadlock are:

      This deadlock is unique because it can still occur in a properly designed database. You are only safe if all of your transactions are very simple and cannot be interleaved in a sequence that causes the deadlock, or if your particular statements do not require a table lock to compile. (For the sake of simplicity, I used LOCK TABLE in my example, but any UPDATE statement would fit.)

      1. Derby4279.java
        3 kB
        Jeff Stuckman
      2. stacktrace.txt
        36 kB
        Mamta A. Satoor
      3. patch4279.txt
        8 kB
        Brett Wooldridge
      4. client_stacktrace_activation_closed.txt
        2 kB
        Kristian Waagan
      5. patch4279_2.txt
        1 kB
        Brett Wooldridge
      6. no-lock-experiment.diff
        2 kB
        Kristian Waagan

        Issue Links

          Activity

          Hide
          Kristian Waagan added a comment -

          Back-ported fix to 10.8 with revision 1384037. I don't plan to go further back.

          Regarding a 10.9 maintenance release, I haven't seen anyone come forward to be release manager so I guess there are no plans right now.
          It appears a 10.8 maintenance release is being planned, although no information has been added to the wiki yet.

          Show
          Kristian Waagan added a comment - Back-ported fix to 10.8 with revision 1384037. I don't plan to go further back. Regarding a 10.9 maintenance release, I haven't seen anyone come forward to be release manager so I guess there are no plans right now. It appears a 10.8 maintenance release is being planned, although no information has been added to the wiki yet.
          Hide
          Kristian Waagan added a comment -

          Reopening for back-port to the 10.8 branch (with 10.8.3 as a potential release vehicle).

          Show
          Kristian Waagan added a comment - Reopening for back-port to the 10.8 branch (with 10.8.3 as a potential release vehicle).
          Hide
          Force Rs added a comment -

          I don't think this fix is in 10.9.1.0. If not, any idea when 10.9.1.1 will be released?

          Show
          Force Rs added a comment - I don't think this fix is in 10.9.1.0. If not, any idea when 10.9.1.1 will be released?
          Hide
          Kristian Waagan added a comment -

          Backported the fix to 10.9 with revision 1373749.
          The fix is pretty isolated, it may be possible to port it further back if anyone sees the need for that.

          Issue ready for verification/closing.

          Show
          Kristian Waagan added a comment - Backported the fix to 10.9 with revision 1373749. The fix is pretty isolated, it may be possible to port it further back if anyone sees the need for that. Issue ready for verification/closing.
          Hide
          Kristian Waagan added a comment -

          Think I just found another keyboard shortcut or something... Reassigning back to Brett.

          Show
          Kristian Waagan added a comment - Think I just found another keyboard shortcut or something... Reassigning back to Brett.
          Hide
          Kristian Waagan added a comment -

          Committed patch 'derby4279_2.txt' to trunk with revision 1365661.
          I plan to backport this patch in a while.

          Show
          Kristian Waagan added a comment - Committed patch 'derby4279_2.txt' to trunk with revision 1365661. I plan to backport this patch in a while.
          Hide
          Kristian Waagan added a comment -

          From what I can see, the lock, which is a "container intention read lock" (CIS), is released as soon as the controller is closed. I observed this by tracing the locks (derby.debug.true=LockTrace + a custom message to the log to navigate in derby.log). This is a very short time-span, and it happens independently of the isolation level of the transaction (separate sub-transaction for compilation?). Once my test-run has completed I will commit the patch.

          If we observe the deadlock also when the mode is DDL_MODE I don't see why we can't use NO_LOCK in all cases, since the code appears to be prepared to handle that the conglomerate is dropped under its feet. However, it seems safer to me to keep the change minimal for the time being.

          Show
          Kristian Waagan added a comment - From what I can see, the lock, which is a "container intention read lock" (CIS), is released as soon as the controller is closed. I observed this by tracing the locks (derby.debug.true=LockTrace + a custom message to the log to navigate in derby.log). This is a very short time-span, and it happens independently of the isolation level of the transaction (separate sub-transaction for compilation?). Once my test-run has completed I will commit the patch. If we observe the deadlock also when the mode is DDL_MODE I don't see why we can't use NO_LOCK in all cases, since the code appears to be prepared to handle that the conglomerate is dropped under its feet. However, it seems safer to me to keep the change minimal for the time being.
          Hide
          Kristian Waagan added a comment -

          I've just returned from vacation, and I'll to do that over the week end. I want to check a few last things before committing.

          Show
          Kristian Waagan added a comment - I've just returned from vacation, and I'll to do that over the week end. I want to check a few last things before committing.
          Hide
          Brett Wooldridge added a comment -

          Can somebody with checkin privileges commit this patch? It is complete as far as I am concerned.

          Show
          Brett Wooldridge added a comment - Can somebody with checkin privileges commit this patch? It is complete as far as I am concerned.
          Hide
          Brett Wooldridge added a comment -

          Mamta,

          Thanks for your feedback. From an efficiency point of view, I think your suggestion (and my initial attempt at a bug fix) is a good approach in general. As Bryan Pendleton commented earlier,

          "I think the approach of allowing each caller to prepare the statement independently is valid and useful. In this age of multi-core computing, I think doing possibly extra work like this in search of higher concurrency (and fewer deadlocks) is a good technique."

          I would however like to do that work under another bug (enhancement), as I think the most recent patch to fix this bug is both sufficient and correct to address this bug per se. When this change has been accepted, I'll open another bug under which to do work on concurrency.

          Show
          Brett Wooldridge added a comment - Mamta, Thanks for your feedback. From an efficiency point of view, I think your suggestion (and my initial attempt at a bug fix) is a good approach in general. As Bryan Pendleton commented earlier, "I think the approach of allowing each caller to prepare the statement independently is valid and useful. In this age of multi-core computing, I think doing possibly extra work like this in search of higher concurrency (and fewer deadlocks) is a good technique." I would however like to do that work under another bug (enhancement), as I think the most recent patch to fix this bug is both sufficient and correct to address this bug per se. When this change has been accepted, I'll open another bug under which to do work on concurrency.
          Hide
          Mamta A. Satoor added a comment -

          I am not familiar with the synchronization code during stale prepared statement re-compile. But I was looking at the first patch submitted by Brett and suggestion by Knut about statements using session schema and how they do not get saved in the statement cache.

          I worked on statements involving session schema many years back and recall implementing logic which will prevent such statements from going into statement cache.

          Will a similar logic for this jira help with all the synchronization issues we are dealing with? ie, if a thread finds that the prepared statement is already in cache but it is being compiled by another thread, then go ahead and create a new GenericPreparedStatement and compile that instead(same as if the statement never existed in the cahce), BUT do not save this new statement in the statement cache(just like a statement referencing session schema). So, this newly compiled statement will not be available to any other thread. But that should be fine because original preapred statement in the cache is already getting compiled and hence it will be available to other threads in future. Like Brett mentioned, "Unless a statement undergoes constant concurrent recompilation (defeating the statement cache anyway)", throwing away a compiled statement after use by a thread if that thread finds previously compiled statement in invalid state and getting compiled by another thread should not be a big overhead.

          I am not sure if creating a GenericPreparedStatement for use by just one thread will solve the synchronization problem in this jira but I wanted to put it out anyways in case if this approach helps.

          Show
          Mamta A. Satoor added a comment - I am not familiar with the synchronization code during stale prepared statement re-compile. But I was looking at the first patch submitted by Brett and suggestion by Knut about statements using session schema and how they do not get saved in the statement cache. I worked on statements involving session schema many years back and recall implementing logic which will prevent such statements from going into statement cache. Will a similar logic for this jira help with all the synchronization issues we are dealing with? ie, if a thread finds that the prepared statement is already in cache but it is being compiled by another thread, then go ahead and create a new GenericPreparedStatement and compile that instead(same as if the statement never existed in the cahce), BUT do not save this new statement in the statement cache(just like a statement referencing session schema). So, this newly compiled statement will not be available to any other thread. But that should be fine because original preapred statement in the cache is already getting compiled and hence it will be available to other threads in future. Like Brett mentioned, "Unless a statement undergoes constant concurrent recompilation (defeating the statement cache anyway)", throwing away a compiled statement after use by a thread if that thread finds previously compiled statement in invalid state and getting compiled by another thread should not be a big overhead. I am not sure if creating a GenericPreparedStatement for use by just one thread will solve the synchronization problem in this jira but I wanted to put it out anyways in case if this approach helps.
          Hide
          Brett Wooldridge added a comment -

          Kristian and Mike,

          Thanks for both of your inputs on this issue. It's been nagging me for 2 years now, I'll be glad to see it buried. Let me know if there is anything else I can do.

          Show
          Brett Wooldridge added a comment - Kristian and Mike, Thanks for both of your inputs on this issue. It's been nagging me for 2 years now, I'll be glad to see it buried. Let me know if there is anything else I can do.
          Hide
          Kristian Waagan added a comment -

          Thanks for sharing your knowledge on this, Mike.
          The way I see it, the demand for a new interface is not strong enough based on this particular case only. It would be if it turns out using the existing interface is too costly performance-wise, but we don't have any data/indications on that.

          For now I'd be content with using Brett's patch with ISOLATION_NOLOCK (assuming it does what it implies). It would be good to determine if we can use ISOLATION_NOLOCK in this particular case also when the dictionary is in DDL-mode (see comment below on the DD mode), but that can be done as a separate step.

          One more comment on the DD modes.
          If the DD is in compile only mode and a compilation is in progress, then no DDL will take place.
          If the DD is in DDL mode and a compilation starts, then the compilation may be performed concurrently with DDL.

          Show
          Kristian Waagan added a comment - Thanks for sharing your knowledge on this, Mike. The way I see it, the demand for a new interface is not strong enough based on this particular case only. It would be if it turns out using the existing interface is too costly performance-wise, but we don't have any data/indications on that. For now I'd be content with using Brett's patch with ISOLATION_NOLOCK (assuming it does what it implies). It would be good to determine if we can use ISOLATION_NOLOCK in this particular case also when the dictionary is in DDL-mode (see comment below on the DD mode), but that can be done as a separate step. One more comment on the DD modes. If the DD is in compile only mode and a compilation is in progress, then no DDL will take place. If the DD is in DDL mode and a compilation starts, then the compilation may be performed concurrently with DDL.
          Hide
          Mike Matrigali added a comment -

          commenting on RowLocation. The system is designed such that one could implement multiple types of conglomerates. Currently there are only 2 types, btree and heap - but a lot of structure to allow for more. So in keeping with this callers know they need a RowLocation but it is up to each implementation to provide the "hidden" actual implementation, in this case the HeapRowLocation. Getting
          this row location from an open conglomerate seemed a natural place as almost all work with a conglomerate already has an open conglomerate of some sort.

          If it helps to not have to open a conglomerate to get this RowLocation then it would be best to just add a new interface. I think code could be written to just return the right kind of RowLocation based
          on the conglomerate id. At least with the current conglomerate inplementations there is nothing ddl related that would change the RowLocation object returned. There might be future implementations
          that might require a lock, but I would just add comments about the current problems with getting locks in this operation and the JIRA issue and let future implementers think about that.

          New interface would look something like: TransactionController.newRowLocationTemplate(conglomerateId)

          If it looks like this is needed let me know, I would be happy to submit a patch with this new interface - if the testing shows this to be helpful.

          Show
          Mike Matrigali added a comment - commenting on RowLocation. The system is designed such that one could implement multiple types of conglomerates. Currently there are only 2 types, btree and heap - but a lot of structure to allow for more. So in keeping with this callers know they need a RowLocation but it is up to each implementation to provide the "hidden" actual implementation, in this case the HeapRowLocation. Getting this row location from an open conglomerate seemed a natural place as almost all work with a conglomerate already has an open conglomerate of some sort. If it helps to not have to open a conglomerate to get this RowLocation then it would be best to just add a new interface. I think code could be written to just return the right kind of RowLocation based on the conglomerate id. At least with the current conglomerate inplementations there is nothing ddl related that would change the RowLocation object returned. There might be future implementations that might require a lock, but I would just add comments about the current problems with getting locks in this operation and the JIRA issue and let future implementers think about that. New interface would look something like: TransactionController.newRowLocationTemplate(conglomerateId) If it looks like this is needed let me know, I would be happy to submit a patch with this new interface - if the testing shows this to be helpful.
          Hide
          Brett Wooldridge added a comment -

          Kristian,

          None of this is documented obviously, but here is my best guess why that code is the way that it is. Trying to open the conglomerate acts as a check that the schema has not been modified by another thread/transaction. There may be other locks in place, as you mentioned, that attempt to prevent DDL run running but my guess is there are still 'seams' where it can happen. Hence the opening of the conglomerate, and note also (especially) the comments in the try...catch in GenericStatement.prepare() where the StandardException is caught and explicitly checked for an error. The comment is:

          // There is a chance that we didn't see the invalidation
          // request from a DDL operation in another thread because
          // the statement wasn't registered as a dependent until
          // after the invalidation had been completed. Assume that's
          // what has happened if we see a conglomerate does not exist
          // error...

          Which implies to me that DDL might not always be blocked, and code was put in place to handle that case. Since DDL is rare in a running system, catching an exception might therefore be much cheaper than some kind of read/write lock to try to prevent it.

          The other possibility is that DDL and DML are completely protected/synchronized now (compared to when that code was written), and some of that code is simply unnecessary. I'm not in a position to make that judgement. Hence my attempt to retain as much as of the original behavior, be it inefficient or not, to minimize side-effects.

          I would be perfectly happy with the patch as submitted along with additional comments in the code basically asking 'Is the still necessary?' with a reference to this bug. Maybe at some point in the future someone digging through the code will unravel the logic completely and determine that 'yes, it is necessary' or 'no, it is not'.

          Show
          Brett Wooldridge added a comment - Kristian, None of this is documented obviously, but here is my best guess why that code is the way that it is. Trying to open the conglomerate acts as a check that the schema has not been modified by another thread/transaction. There may be other locks in place, as you mentioned, that attempt to prevent DDL run running but my guess is there are still 'seams' where it can happen. Hence the opening of the conglomerate, and note also (especially) the comments in the try...catch in GenericStatement.prepare() where the StandardException is caught and explicitly checked for an error. The comment is: // There is a chance that we didn't see the invalidation // request from a DDL operation in another thread because // the statement wasn't registered as a dependent until // after the invalidation had been completed. Assume that's // what has happened if we see a conglomerate does not exist // error... Which implies to me that DDL might not always be blocked, and code was put in place to handle that case. Since DDL is rare in a running system, catching an exception might therefore be much cheaper than some kind of read/write lock to try to prevent it. The other possibility is that DDL and DML are completely protected/synchronized now (compared to when that code was written), and some of that code is simply unnecessary. I'm not in a position to make that judgement. Hence my attempt to retain as much as of the original behavior, be it inefficient or not, to minimize side-effects. I would be perfectly happy with the patch as submitted along with additional comments in the code basically asking 'Is the still necessary?' with a reference to this bug. Maybe at some point in the future someone digging through the code will unravel the logic completely and determine that 'yes, it is necessary' or 'no, it is not'.
          Hide
          Kristian Waagan added a comment -

          As an experiment I ran the JUnit regression tests with the patch 'no-lock-experiment.diff'. It just hardcodes the use of a HeapRowLocation instead of opening the conglomerate and getting a lock.

          The tests completed without any errors.
          The problem is I don't really know if all aspects of the change has been tested, and since most of the tests are single-threaded having these tests pass doesn't really give me that much confidence about the change.

          Show
          Kristian Waagan added a comment - As an experiment I ran the JUnit regression tests with the patch 'no-lock-experiment.diff'. It just hardcodes the use of a HeapRowLocation instead of opening the conglomerate and getting a lock. The tests completed without any errors. The problem is I don't really know if all aspects of the change has been tested, and since most of the tests are single-threaded having these tests pass doesn't really give me that much confidence about the change.
          Hide
          Kristian Waagan added a comment -

          Brett,

          I think store.AutomaticIndexStatisticsMultiTest may test this scenario.
          My point was not to claim that there is something wrong with the patch, but rather to point out that I believe there is another mechanism in place that will block DDL if a compilation is taking place. Your patch doesn't change anything wrt that mechanism.

          Now, looking at that second call to openConglomerate it seems to me it's being done only to obtain a row location template. Only heaps can create row location objects. A heap row location template is nothing but a Java object wrapping a few primitive values that haven't been set. I don't immediately see why the conglomerate/heap even has to be open to obtain such an object, but maybe the restriction was put in place for a good reason (anyone?).

          I'm optimistic that the approach in the patch is sound. The question is if obtaining the lock can cause problems in DDL mode, and if we can skip taking it in that mode too. However, I haven't dug deep enough to understand the locking code. Can anyone with more experience in that area see if not taking the lock in RCL.generateHolderMethod has consequences the code isn't prepared to handle?

          Show
          Kristian Waagan added a comment - Brett, I think store.AutomaticIndexStatisticsMultiTest may test this scenario. My point was not to claim that there is something wrong with the patch, but rather to point out that I believe there is another mechanism in place that will block DDL if a compilation is taking place. Your patch doesn't change anything wrt that mechanism. Now, looking at that second call to openConglomerate it seems to me it's being done only to obtain a row location template. Only heaps can create row location objects. A heap row location template is nothing but a Java object wrapping a few primitive values that haven't been set. I don't immediately see why the conglomerate/heap even has to be open to obtain such an object, but maybe the restriction was put in place for a good reason (anyone?). I'm optimistic that the approach in the patch is sound. The question is if obtaining the lock can cause problems in DDL mode, and if we can skip taking it in that mode too. However, I haven't dug deep enough to understand the locking code. Can anyone with more experience in that area see if not taking the lock in RCL.generateHolderMethod has consequences the code isn't prepared to handle?
          Hide
          Brett Wooldridge added a comment -

          Kristian,

          You might be right, I could be mistaken about a SELECT not blocking DDL. Be that as it may, this defect here is only tangential to DDL, and as the testcase shows, the deadlock occurs without any DDL per se. Just simple table-locking. As to whether this change could adversely affect DDL that is run concurrently with queries – such as it used to block it but now doesn't, I am not sure. I don't have a test that runs DDL concurrently with queries, do you know if any of the standard tests or stress tests do that?

          Show
          Brett Wooldridge added a comment - Kristian, You might be right, I could be mistaken about a SELECT not blocking DDL. Be that as it may, this defect here is only tangential to DDL, and as the testcase shows, the deadlock occurs without any DDL per se. Just simple table-locking. As to whether this change could adversely affect DDL that is run concurrently with queries – such as it used to block it but now doesn't, I am not sure. I don't have a test that runs DDL concurrently with queries, do you know if any of the standard tests or stress tests do that?
          Hide
          Kristian Waagan added a comment -

          Thanks for your continued effort on this problem, Brett.

          Without having reviewed the approach used in the patch thoroughly, I can say that suites.All and derbyall passed with the patch applied in my environment.

          BW> So, even though the compile of a SELECT will no longer block DDL from occurring [...]

          I'm not convinced this is true. There's another mechanism in place in the data dictionary (DD) to stop that from happening. For a DDL operation to be allowed, there can't be any ongoing compilations. Note that the opposite isn't the case; compilations can occur even though a DDL is in progress. In the latter case the system catalogs are queried directly instead of going through the/a cache.

          Now, this mechanism relies on several state objects and is a bit complicated so I'm not 100% sure I'm correct. The methods startReading/doneReading and startWriting/transactionFinished in DataDictionaryImpl are good starting points to investigate this further. Note the use of the lock and the relevant counters.

          Show
          Kristian Waagan added a comment - Thanks for your continued effort on this problem, Brett. Without having reviewed the approach used in the patch thoroughly, I can say that suites.All and derbyall passed with the patch applied in my environment. BW> So, even though the compile of a SELECT will no longer block DDL from occurring [...] I'm not convinced this is true. There's another mechanism in place in the data dictionary (DD) to stop that from happening. For a DDL operation to be allowed, there can't be any ongoing compilations. Note that the opposite isn't the case; compilations can occur even though a DDL is in progress. In the latter case the system catalogs are queried directly instead of going through the/a cache. Now, this mechanism relies on several state objects and is a bit complicated so I'm not 100% sure I'm correct. The methods startReading/doneReading and startWriting/transactionFinished in DataDictionaryImpl are good starting points to investigate this further. Note the use of the lock and the relevant counters.
          Hide
          Brett Wooldridge added a comment -

          Okay, here is my second cut a fix...

          First of all, the original report surmised that any DML, not just SELECT, could cause this deadlock. I myself thought this as well, but it turns out not to be the case.

          There are two places in the statement compilation code where openConglomerate() is called. All compiled statement nodes ultimately inherit from StatementNode. In StatementNode.lockTableForCompilation() you will find this logic:

          if (dataDictionary.getCacheMode() == DataDictionary.DDL_MODE) <=== Important

          { ... heapCC = tc.openConglomerate(td.getHeapConglomerateId(), false, TransactionController.OPENMODE_FORUPDATE | TransactionController.OPENMODE_FOR_LOCK_ONLY, TransactionController.MODE_RECORD, TransactionController.ISOLATION_SERIALIZABLE); <=== Important ... }

          The method is preceded with this comment:

          /* We need to get some kind of table lock (IX here) at the beginning of

          • compilation of DMLModStatementNode and DDLStatementNode, to prevent the
          • interference of insert/update/delete/DDL compilation and DDL execution,
          • see beetle 3976, 4343, and $WS/language/SolutionsToConcurrencyIssues.txt
            */

          However, ultimately, according to the logic the lock is only obtained if the DataDictionary is in DDL_MODE. This is all fine.

          Now to the heart of the issue. The second place where openConglomerate() is called is in ResultColumnList.generateHolderMethod(). This is used for SELECT statements for columns that will ultimately appear in the ResultSet (or sub-SELECTs in nested queries). In ResultColumnList.generateHolderMethod() you will find this logic:

          cc = getLanguageConnectionContext().getTransactionCompile().openConglomerate(
          conglomerateId,
          false,
          0,
          TransactionController.MODE_RECORD,
          TransactionController.ISOLATION_READ_COMMITTED); <=== Important

          Notice no conditional logic. In other words, a lock on the conglomerate is unconditionally obtained. It is this lock that contends with a table lock that may have been obtained elsewhere (possibly DDL, possibly an explicit lock). The attached patch changes this code to the following:

          LanguageConnectionContext lcc = getLanguageConnectionContext();
          DataDictionary dd = lcc.getDataDictionary();
          int isolationLevel = (dd.getCacheMode() == DataDictionary.DDL_MODE) ?
          TransactionController.ISOLATION_READ_COMMITTED : TransactionController.ISOLATION_NOLOCK;
          cc = lcc.getTransactionCompile().openConglomerate(
          conglomerateId,
          false,
          0,
          TransactionController.MODE_RECORD,
          isolationLevel);

          Basically, it too checks the DataDictionary to see if it is in DDL_MODE. If it is, it maintains the same locking level as the existing code. However, if the DataDictionary is not being modified the code will not obtain a lock.

          What is the impact? The original code has the effect of blocking DDL from execution by locking the table(s) during this compile phase. And in turn, if DDL has locked the table(s), the compilation will be blocked. The first I take to be an unintended side-effect (as you'll see).

          Let's take the second case first. If DDL has occurred (and locked tables), the DataDictionary will be in DDL_MODE, and the new code behaves the same as the old code; the conglomerate is opened with READ_COMMITTED isolation and the compilation will be blocked just as before.

          Now the first case. If compilation of a statement (that involves ResultColumnList) has started and no DDL is in-progress at that instant, no locks are obtained (because the cache mode is not DDL_MODE). This is the same as other statement types [that extend from StatementNode]. If DDL occurs after compilation has started, it will not be blocked (but it was never blocked in all other cases). Because it was never blocked in other cases (due to the conditional logic), I take this to be a side-effect. However...

          If DDL does happen to alter the table(s) involved there is already code to handle it. All roads to ResultColumnList.generateHolderMethod() lead back to GenericStatement.prepare(). GenericStatement.prepare() has this logic fragment:

          try

          { return prepMinion(lcc, true, (Object[]) null, (SchemaDescriptor) null, forMetaData); }

          catch (StandardException se) {
          // There is a chance that we didn't see the invalidation
          // request from a DDL operation in another thread because
          // the statement wasn't registered as a dependent until
          // after the invalidation had been completed. Assume that's
          // what has happened if we see a conglomerate does not exist
          // error, and force a retry even if the statement hasn't been
          // invalidated.
          if (SQLState.STORE_CONGLOMERATE_DOES_NOT_EXIST.equals(se.getMessageId()))

          { ... recompile ... }

          }

          So, even though the compile of a SELECT will no longer block DDL from occurring (remember UPDATE, DELETE, and all other DML never blocked it anyway) there is code in-place to handle the possibility that openConglomerate() in StatementNode or ResultColumnList might fail. In that case, the compile is given one more shot before failing (recompile = true).

          It is debatable now that the logic in ResultColumnList is the same as StatementNode, whether the isolation in ResultColumnList should also be ISOLATION_SERIALIZABLE like StatementNode given that it is now also conditional based on DataDictionary.getCacheMode() == DDL_MODE. However, my approach was to be least impacting in terms of deviation from current known working code.

          On a final note, this patch obviously fixed the test case attached to this issue, however past attempts to fix this bug did the same but ultimately failed in some of the long running stress tests. It would be helpful if this patch could be applied and let run through the usual full barrage of tests. That said, the previous patch attempted to fix the issue at the statement cache level with all of it's hairy synchronization. This patch is considerably simpler both conceptually and implementation-wise.

          Show
          Brett Wooldridge added a comment - Okay, here is my second cut a fix... First of all, the original report surmised that any DML, not just SELECT, could cause this deadlock. I myself thought this as well, but it turns out not to be the case. There are two places in the statement compilation code where openConglomerate() is called. All compiled statement nodes ultimately inherit from StatementNode. In StatementNode.lockTableForCompilation() you will find this logic: if (dataDictionary.getCacheMode() == DataDictionary.DDL_MODE) <=== Important { ... heapCC = tc.openConglomerate(td.getHeapConglomerateId(), false, TransactionController.OPENMODE_FORUPDATE | TransactionController.OPENMODE_FOR_LOCK_ONLY, TransactionController.MODE_RECORD, TransactionController.ISOLATION_SERIALIZABLE); <=== Important ... } The method is preceded with this comment: /* We need to get some kind of table lock (IX here) at the beginning of compilation of DMLModStatementNode and DDLStatementNode, to prevent the interference of insert/update/delete/DDL compilation and DDL execution, see beetle 3976, 4343, and $WS/language/SolutionsToConcurrencyIssues.txt */ However, ultimately, according to the logic the lock is only obtained if the DataDictionary is in DDL_MODE. This is all fine. Now to the heart of the issue. The second place where openConglomerate() is called is in ResultColumnList.generateHolderMethod(). This is used for SELECT statements for columns that will ultimately appear in the ResultSet (or sub-SELECTs in nested queries). In ResultColumnList.generateHolderMethod() you will find this logic: cc = getLanguageConnectionContext().getTransactionCompile().openConglomerate( conglomerateId, false, 0, TransactionController.MODE_RECORD, TransactionController.ISOLATION_READ_COMMITTED); <=== Important Notice no conditional logic. In other words, a lock on the conglomerate is unconditionally obtained. It is this lock that contends with a table lock that may have been obtained elsewhere (possibly DDL, possibly an explicit lock). The attached patch changes this code to the following: LanguageConnectionContext lcc = getLanguageConnectionContext(); DataDictionary dd = lcc.getDataDictionary(); int isolationLevel = (dd.getCacheMode() == DataDictionary.DDL_MODE) ? TransactionController.ISOLATION_READ_COMMITTED : TransactionController.ISOLATION_NOLOCK; cc = lcc.getTransactionCompile().openConglomerate( conglomerateId, false, 0, TransactionController.MODE_RECORD, isolationLevel); Basically, it too checks the DataDictionary to see if it is in DDL_MODE. If it is, it maintains the same locking level as the existing code. However, if the DataDictionary is not being modified the code will not obtain a lock. What is the impact? The original code has the effect of blocking DDL from execution by locking the table(s) during this compile phase. And in turn, if DDL has locked the table(s), the compilation will be blocked. The first I take to be an unintended side-effect (as you'll see). Let's take the second case first. If DDL has occurred (and locked tables), the DataDictionary will be in DDL_MODE, and the new code behaves the same as the old code; the conglomerate is opened with READ_COMMITTED isolation and the compilation will be blocked just as before. Now the first case. If compilation of a statement (that involves ResultColumnList) has started and no DDL is in-progress at that instant, no locks are obtained (because the cache mode is not DDL_MODE). This is the same as other statement types [that extend from StatementNode] . If DDL occurs after compilation has started, it will not be blocked (but it was never blocked in all other cases). Because it was never blocked in other cases (due to the conditional logic), I take this to be a side-effect. However... If DDL does happen to alter the table(s) involved there is already code to handle it. All roads to ResultColumnList.generateHolderMethod() lead back to GenericStatement.prepare(). GenericStatement.prepare() has this logic fragment: try { return prepMinion(lcc, true, (Object[]) null, (SchemaDescriptor) null, forMetaData); } catch (StandardException se) { // There is a chance that we didn't see the invalidation // request from a DDL operation in another thread because // the statement wasn't registered as a dependent until // after the invalidation had been completed. Assume that's // what has happened if we see a conglomerate does not exist // error, and force a retry even if the statement hasn't been // invalidated. if (SQLState.STORE_CONGLOMERATE_DOES_NOT_EXIST.equals(se.getMessageId())) { ... recompile ... } } So, even though the compile of a SELECT will no longer block DDL from occurring (remember UPDATE, DELETE, and all other DML never blocked it anyway) there is code in-place to handle the possibility that openConglomerate() in StatementNode or ResultColumnList might fail. In that case, the compile is given one more shot before failing (recompile = true). It is debatable now that the logic in ResultColumnList is the same as StatementNode, whether the isolation in ResultColumnList should also be ISOLATION_SERIALIZABLE like StatementNode given that it is now also conditional based on DataDictionary.getCacheMode() == DDL_MODE. However, my approach was to be least impacting in terms of deviation from current known working code. On a final note, this patch obviously fixed the test case attached to this issue, however past attempts to fix this bug did the same but ultimately failed in some of the long running stress tests. It would be helpful if this patch could be applied and let run through the usual full barrage of tests. That said, the previous patch attempted to fix the issue at the statement cache level with all of it's hairy synchronization. This patch is considerably simpler both conceptually and implementation-wise.
          Hide
          Mike Matrigali added a comment -

          I have not been following this issue, so not sure of the details and am not an expert in this area of the code but here is what I think. I assume that Derby is getting a table level "INTENT" shared lock vs a table level "SHARED" lock. This
          is a very different thing, but for your purposes is causing the same issue. Intent locks conflict with other table level SHARED AND EXCLUSIVE locks but not with other
          intent locks.

          I think prepare gets these intent locks on the table to insure it gets a consistent view of all the ddl associated with the table that it is compiling. The main goal is to block other ddl
          from happening during the prepare, the assumption is that ddl and table level share and exclusive locking is rare (obviously this assumption is not working in the case of your
          application.) I assume more people are not seeing this because most applications do not require table level shared and exclusive access. Someone with more expertise
          is this area of the code should comment, but I wonder if we could either eliminate this lock or make it much shorter term if we guaranteed to check if ddl had happened
          during the prepare at the very end - a lot of this information is cached so I wonder if the locks are actually doing the work I describe above or if you just need the locks
          short term to consistently populate the caches.

          Because all the information for a single table is spread across multiple catalogs one may need to do multiple probes to get all the information for a single prepare.
          An example of the kind of bug that has happened in the past is that a prepare produces a plan that is not aware of a new index just added, and a subsequent insert using that
          plan does not update the index and thus results in a corrupt database.

          Sharing plans across connections was a big performance improvement added to derby based on many customer applications and benchmarks. Derby compile tiime is often very
          slow so anything that can be done to reduce/eliminate that compile time is important. There are a lot of applications out there that are getting performance boosts from the
          shared query cache without even knowing it, and yes they may be able to get similar results with application changes but instead now are gettting it automatically. So I would
          not support eliminating altogether, but Derby is built on modular concept. If you wanted to add a change that allowed derby to boot in a mode that did not have a shared
          cache (while still supporting default of a shared cache), that might be a reasonable approach. Similar to we default to derby working disk based, but allow it to booted with
          a different module that allows it to be memory based. I know at least the disk page cache is implemented in a module that was designed to be easily replaced, not sure about
          current state of query cache.

          Show
          Mike Matrigali added a comment - I have not been following this issue, so not sure of the details and am not an expert in this area of the code but here is what I think. I assume that Derby is getting a table level "INTENT" shared lock vs a table level "SHARED" lock. This is a very different thing, but for your purposes is causing the same issue. Intent locks conflict with other table level SHARED AND EXCLUSIVE locks but not with other intent locks. I think prepare gets these intent locks on the table to insure it gets a consistent view of all the ddl associated with the table that it is compiling. The main goal is to block other ddl from happening during the prepare, the assumption is that ddl and table level share and exclusive locking is rare (obviously this assumption is not working in the case of your application.) I assume more people are not seeing this because most applications do not require table level shared and exclusive access. Someone with more expertise is this area of the code should comment, but I wonder if we could either eliminate this lock or make it much shorter term if we guaranteed to check if ddl had happened during the prepare at the very end - a lot of this information is cached so I wonder if the locks are actually doing the work I describe above or if you just need the locks short term to consistently populate the caches. Because all the information for a single table is spread across multiple catalogs one may need to do multiple probes to get all the information for a single prepare. An example of the kind of bug that has happened in the past is that a prepare produces a plan that is not aware of a new index just added, and a subsequent insert using that plan does not update the index and thus results in a corrupt database. Sharing plans across connections was a big performance improvement added to derby based on many customer applications and benchmarks. Derby compile tiime is often very slow so anything that can be done to reduce/eliminate that compile time is important. There are a lot of applications out there that are getting performance boosts from the shared query cache without even knowing it, and yes they may be able to get similar results with application changes but instead now are gettting it automatically. So I would not support eliminating altogether, but Derby is built on modular concept. If you wanted to add a change that allowed derby to boot in a mode that did not have a shared cache (while still supporting default of a shared cache), that might be a reasonable approach. Similar to we default to derby working disk based, but allow it to booted with a different module that allows it to be memory based. I know at least the disk page cache is implemented in a module that was designed to be easily replaced, not sure about current state of query cache.
          Hide
          Brett Wooldridge added a comment -

          I've been looking at 4279 again today...

          ..and thinking of possible solutions, when a question arose. First and foremost, the deadlock is caused by the fact that preparing a statement requires a table lock (shared) in Derby. Why is this, technically? If the requirement that a table lock is needed to prepare a statement can be removed, this deadlock can be
          fixed.

          Alternatively, if the requirement that a table lock is needed cannot be removed, a possible resolution for 4279 is to remove the concept that prepared statements are shared across connections and instead make the statement cache per-connection. While this increases the memory overhead slightly – I have to believe that the artifacts of a prepared statement are in fact extremely small – it removes a lot of shared-cache synchronization code and probably increases concurrency in general. If you've been in that code, the synchronization is pretty hairy (as you can see from the comments in 4279 as well) and there are synchronization blocks in there but commented out for reasons no existing developers can explain.

          In fact, now that I think of it, it would be great if the requirement for a table lock could be removed when preparing a statement AND the cache made per-connection (to simplify the code to a point that humans can understand).

          I understand there is probably an edge case whereby performance would be degraded compared to existing code – that being a scenario in which connections are created and discarded frequently. But that is a scenario easily solved by connection re-use, either explicit or by use of a connection pool.

          Thoughts? I'm willing to put in some work if either of these approaches is acceptable. I already put in considerable time on 4279 over a year ago, but eventually abandoned it (as you can see in the comments) due to synchronization issues in the shared cache.

          Show
          Brett Wooldridge added a comment - I've been looking at 4279 again today... ..and thinking of possible solutions, when a question arose. First and foremost, the deadlock is caused by the fact that preparing a statement requires a table lock (shared) in Derby. Why is this, technically? If the requirement that a table lock is needed to prepare a statement can be removed, this deadlock can be fixed. Alternatively, if the requirement that a table lock is needed cannot be removed, a possible resolution for 4279 is to remove the concept that prepared statements are shared across connections and instead make the statement cache per-connection. While this increases the memory overhead slightly – I have to believe that the artifacts of a prepared statement are in fact extremely small – it removes a lot of shared-cache synchronization code and probably increases concurrency in general. If you've been in that code, the synchronization is pretty hairy (as you can see from the comments in 4279 as well) and there are synchronization blocks in there but commented out for reasons no existing developers can explain. In fact, now that I think of it, it would be great if the requirement for a table lock could be removed when preparing a statement AND the cache made per-connection (to simplify the code to a point that humans can understand). I understand there is probably an edge case whereby performance would be degraded compared to existing code – that being a scenario in which connections are created and discarded frequently. But that is a scenario easily solved by connection re-use, either explicit or by use of a connection pool. Thoughts? I'm willing to put in some work if either of these approaches is acceptable. I already put in considerable time on 4279 over a year ago, but eventually abandoned it (as you can see in the comments) due to synchronization issues in the shared cache.
          Hide
          Kim Haase added a comment -

          If/when this issue is fixed, a documentation JIRA should be filed: the statements made for DERBY-4280 about setting the statementCacheSize property to 0 should be removed.

          Show
          Kim Haase added a comment - If/when this issue is fixed, a documentation JIRA should be filed: the statements made for DERBY-4280 about setting the statementCacheSize property to 0 should be removed.
          Hide
          Brett Wooldridge added a comment -

          It is inappropriate to offer a "bounty" for this bug? I'm willing to pay to have this fixed. The amount of time I've spent trying to workaround this issue is ridiculous, and ultimately all such attempts have ended up distorting the readability of the code.

          I am actually stunned that more people are not encountering this bug. Basically if two (or more) threads are SELECTing and UPDATEing the same table, they WILL hit this issue under load.

          Show
          Brett Wooldridge added a comment - It is inappropriate to offer a "bounty" for this bug? I'm willing to pay to have this fixed. The amount of time I've spent trying to workaround this issue is ridiculous, and ultimately all such attempts have ended up distorting the readability of the code. I am actually stunned that more people are not encountering this bug. Basically if two (or more) threads are SELECTing and UPDATEing the same table, they WILL hit this issue under load.
          Hide
          Brett Wooldridge added a comment -

          Funny you mention 5367 and the workaround there. It was me who opened 5367, and mentioned my workaround. I guess I should amend my comment there, because just today I discovered that my "workaround" does not in fact work. It appeared to work, and seems somewhat less likely to trigger this deadlock, but ultimately under load it still encounters this issue.

          Show
          Brett Wooldridge added a comment - Funny you mention 5367 and the workaround there. It was me who opened 5367, and mentioned my workaround. I guess I should amend my comment there, because just today I discovered that my "workaround" does not in fact work. It appeared to work, and seems somewhat less likely to trigger this deadlock, but ultimately under load it still encounters this issue.
          Hide
          Kristian Waagan added a comment -

          Ticked "Seen in production". See DERBY-5367 (comment of 20/Sep/11 01:22) for an example of a workaround to avoid this bug. To me, that workaround seems like a significant burden for people implementing applications on top of Derby.

          Show
          Kristian Waagan added a comment - Ticked "Seen in production". See DERBY-5367 (comment of 20/Sep/11 01:22) for an example of a workaround to avoid this bug. To me, that workaround seems like a significant burden for people implementing applications on top of Derby.
          Hide
          Brett Wooldridge added a comment -

          I am changing this bug back to unassigned. I have been unable to work on this issue, and leaving it as assigned to me potentially prevents another developer from picking it up. This is still a major bug, and we still see it in our production server sporadically.

          I hope someone else can pick it up and run with it. I could suggest looking at the patch here only to gain some understand of the pieces involved – I think the general approach of the patch may be inherently incorrect.

          Show
          Brett Wooldridge added a comment - I am changing this bug back to unassigned. I have been unable to work on this issue, and leaving it as assigned to me potentially prevents another developer from picking it up. This is still a major bug, and we still see it in our production server sporadically. I hope someone else can pick it up and run with it. I could suggest looking at the patch here only to gain some understand of the pieces involved – I think the general approach of the patch may be inherently incorrect.
          Hide
          Myrna van Lunteren added a comment -

          linking, because when further work is done on this, re-occurrence of DERBY-4721 needs to be prevented.

          Show
          Myrna van Lunteren added a comment - linking, because when further work is done on this, re-occurrence of DERBY-4721 needs to be prevented.
          Hide
          Knut Anders Hatlen added a comment -

          I've backed out the fix from trunk. Committed revision 958529.

          Show
          Knut Anders Hatlen added a comment - I've backed out the fix from trunk. Committed revision 958529.
          Hide
          Kristian Waagan added a comment -

          Another error. I was trying to get a stack trace with line numbers for the getMetaData error, but this popped up instead.

          [Error/failure logged at Sat Jun 26 06:04:57 CEST 2010]
          junit.framework.AssertionFailedError: Caused by:
          java.sql.SQLException: Java exception: ': java.lang.NullPointerException'.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95)
          at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:142)
          at org.apache.derby.impl.jdbc.Util.javaException(Util.java:299)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2269)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81)
          at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:614)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeQuery(EmbedStatement.java:152)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.select(StressMultiTest.java:555)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:429)
          at java.lang.Thread.run(Thread.java:619)
          Caused by: java.sql.SQLException: Java exception: ': java.lang.NullPointerException'.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70)
          ... 11 more
          Caused by: java.lang.NullPointerException
          at org.apache.derby.impl.sql.GenericActivationHolder.<init>(GenericActivationHolder.java:129)
          at org.apache.derby.impl.sql.GenericPreparedStatement.getActivation(GenericPreparedStatement.java:257)
          at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:609)
          ... 4 more

          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.handleException(StressMultiTest.java:351)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.access$200(StressMultiTest.java:70)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:445)
          at java.lang.Thread.run(Thread.java:619)

          Show
          Kristian Waagan added a comment - Another error. I was trying to get a stack trace with line numbers for the getMetaData error, but this popped up instead. [Error/failure logged at Sat Jun 26 06:04:57 CEST 2010] junit.framework.AssertionFailedError: Caused by: java.sql.SQLException: Java exception: ': java.lang.NullPointerException'. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:142) at org.apache.derby.impl.jdbc.Util.javaException(Util.java:299) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2269) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81) at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:614) at org.apache.derby.impl.jdbc.EmbedStatement.executeQuery(EmbedStatement.java:152) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.select(StressMultiTest.java:555) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:429) at java.lang.Thread.run(Thread.java:619) Caused by: java.sql.SQLException: Java exception: ': java.lang.NullPointerException'. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70) ... 11 more Caused by: java.lang.NullPointerException at org.apache.derby.impl.sql.GenericActivationHolder.<init>(GenericActivationHolder.java:129) at org.apache.derby.impl.sql.GenericPreparedStatement.getActivation(GenericPreparedStatement.java:257) at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:609) ... 4 more at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.handleException(StressMultiTest.java:351) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.access$200(StressMultiTest.java:70) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:445) at java.lang.Thread.run(Thread.java:619)
          Hide
          Kristian Waagan added a comment -

          Attached 'client_stacktrace_activation_closed.txt', an error that happened a few times when running the test in a loop ("Activation closed, operation execute not permitted").
          Attaching it for reference, I'm not 100% sure it is valid since the test created a derby.log file of 108GB. This caused my disk to fill up and it was deleted when the process died.

          Show
          Kristian Waagan added a comment - Attached 'client_stacktrace_activation_closed.txt', an error that happened a few times when running the test in a loop ("Activation closed, operation execute not permitted"). Attaching it for reference, I'm not 100% sure it is valid since the test created a derby.log file of 108GB. This caused my disk to fill up and it was deleted when the process died.
          Hide
          Mike Matrigali added a comment -

          I remember when a lot of the original Statement Cache work was being done the original multi stress test was particularly good at showing up issues. There is a version that I don't think gets
          run nightly that is even more of a load - StressMulti50x59.java, it runs 50 users for 59 minutes.
          Even then just passing this test once did not really prove anything as the issues often are very
          subtle and timing dependent. Often we would run this test over and over again as machine resources were available. This week I will see if we can find a machine to do this but I don't think
          we have anything more than a couple of processors.

          Running this test on a machine with the most possible processors would be good to verify this
          fix.

          Does this fix mean that we never wait for an existing query plan? Does anyone understand what this means about performance for queries that take extremely long to compile. Unfortunately I have seen plans take many minutes to compile depending on how complex, and some customers often "pre-load" the plans at application startup, set the cache size such that the plan is expected to be in cache.

          Show
          Mike Matrigali added a comment - I remember when a lot of the original Statement Cache work was being done the original multi stress test was particularly good at showing up issues. There is a version that I don't think gets run nightly that is even more of a load - StressMulti50x59.java, it runs 50 users for 59 minutes. Even then just passing this test once did not really prove anything as the issues often are very subtle and timing dependent. Often we would run this test over and over again as machine resources were available. This week I will see if we can find a machine to do this but I don't think we have anything more than a couple of processors. Running this test on a machine with the most possible processors would be good to verify this fix. Does this fix mean that we never wait for an existing query plan? Does anyone understand what this means about performance for queries that take extremely long to compile. Unfortunately I have seen plans take many minutes to compile depending on how complex, and some customers often "pre-load" the plans at application startup, set the cache size such that the plan is expected to be in cache.
          Hide
          Brett Wooldridge added a comment -

          I recommend backing out this fix until synchronization/threading issues have been resolved.

          Show
          Brett Wooldridge added a comment - I recommend backing out this fix until synchronization/threading issues have been resolved.
          Hide
          Knut Anders Hatlen added a comment -

          A couple more suspected fallouts of this fix:

          • Mamta added a comment to DERBY-4167 about another NPE in StressMultiTest right after the fix was checked in.
          • Olav's nightly performance regression tests (http://home.online.no/~olmsan/derby/perf/) have not been able to complete after the check-in. It gets stuck in the multi-threaded delete tests. I'll investigate and try to isolate the problem.
          Show
          Knut Anders Hatlen added a comment - A couple more suspected fallouts of this fix: Mamta added a comment to DERBY-4167 about another NPE in StressMultiTest right after the fix was checked in. Olav's nightly performance regression tests ( http://home.online.no/~olmsan/derby/perf/ ) have not been able to complete after the check-in. It gets stuck in the multi-threaded delete tests. I'll investigate and try to isolate the problem.
          Hide
          Knut Anders Hatlen added a comment -

          I haven't studied the details of this code yet, but the synchronization on getConnectionSynchronization() is probably not enough to ensure safe access to the GenericPreparedStatement. getConnectionSynchronization() is used to ensure that a single connection cannot be executing in two threads at the same time, but one GenericPreparedStatement can be shared between multiple connections.

          Assuming this is the same issue as DERBY-3823/DERBY-1635 and it is getActivationClass() that returns null, it looks like getMetaData() must have been called when the statement was being recompiled by another thread, and that thread had already called preparedStatement.setActivationClass(null) in GenericPreparedStatement.prepMinion(). I'm not sure if more synchronization is what's needed to fix this bug, or if the code just needs to handle null returned from getActivationClass(), or perhaps both.

          Show
          Knut Anders Hatlen added a comment - I haven't studied the details of this code yet, but the synchronization on getConnectionSynchronization() is probably not enough to ensure safe access to the GenericPreparedStatement. getConnectionSynchronization() is used to ensure that a single connection cannot be executing in two threads at the same time, but one GenericPreparedStatement can be shared between multiple connections. Assuming this is the same issue as DERBY-3823 / DERBY-1635 and it is getActivationClass() that returns null, it looks like getMetaData() must have been called when the statement was being recompiled by another thread, and that thread had already called preparedStatement.setActivationClass(null) in GenericPreparedStatement.prepMinion(). I'm not sure if more synchronization is what's needed to fix this bug, or if the code just needs to handle null returned from getActivationClass(), or perhaps both.
          Hide
          Brett Wooldridge added a comment -

          Along the same lines, GenericPreparedStatement line 250 (in getActivation()) receives the value of rePrepare() into a local PreparedStatement (post-patch), but only for the purpose of asserting that it didn't change. This is also possibly incorrect ... but I'm not sure I grok all of the interactions between GenericPreparedStatement, GeneratedClass, GenericActivationHolder, LanguageConnectionContext, and EmbedPreparedStatement.

          I didn't encounter any errors in the stressmulti test, and I'm running on an 8-core box, so I am not sure how easily I can validate the robustness of this change. I may require some assistance, and certainly knowledgable input is welcome.

          Show
          Brett Wooldridge added a comment - Along the same lines, GenericPreparedStatement line 250 (in getActivation()) receives the value of rePrepare() into a local PreparedStatement (post-patch), but only for the purpose of asserting that it didn't change. This is also possibly incorrect ... but I'm not sure I grok all of the interactions between GenericPreparedStatement, GeneratedClass, GenericActivationHolder, LanguageConnectionContext, and EmbedPreparedStatement. I didn't encounter any errors in the stressmulti test, and I'm running on an 8-core box, so I am not sure how easily I can validate the robustness of this change. I may require some assistance, and certainly knowledgable input is welcome.
          Hide
          Brett Wooldridge added a comment -

          I need some input here, because I'm wading into unknown waters...

          Certainly, the return value from rePrepare() can be assigned back to preparedStatement on line 1080 ... along the lines of:

          preparedStatement = preparedStatement.rePrepare(lcc);

          But I am concerned about synchronization issues. While the getMetaData() method itself obtains a lock via:

          synchronized (getConnectionSynchronization())

          { ... }

          Many other places in EmbedPreparedStatement access the preparedStatement member variable without such synchronization. It is safe to assume these other accesses are synchronized by some higher-level lock?

          Show
          Brett Wooldridge added a comment - I need some input here, because I'm wading into unknown waters... Certainly, the return value from rePrepare() can be assigned back to preparedStatement on line 1080 ... along the lines of: preparedStatement = preparedStatement.rePrepare(lcc); But I am concerned about synchronization issues. While the getMetaData() method itself obtains a lock via: synchronized (getConnectionSynchronization()) { ... } Many other places in EmbedPreparedStatement access the preparedStatement member variable without such synchronization. It is safe to assume these other accesses are synchronized by some higher-level lock?
          Hide
          Kristian Waagan added a comment -

          The tinderbox test failed again with the following error in StressTestMulti:
          2010-06-21 16:02:44.542 GMT Thread[DRDAConnThread_1289,5,derby.daemons] (XID = 1929067), (SESSIONID = 61), (DATABASE = wombat), (DRDAID = NF000001.P530-869193450946392963

          {30}

          ), Failed Statement is: null
          java.lang.NullPointerException
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.getMetaData(Unknown Source)
          at org.apache.derby.impl.drda.DRDAConnThread.writeSQLDARD(Unknown Source)
          at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
          at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)

          It looks suspicious that after the change from returning void to returning a PreparedStatement, the return value is ignored in EmbedPreparedStatement.getMetaData. Is this correct?

          As for the error itself, it has already been reported (DERBY-3823).

          Show
          Kristian Waagan added a comment - The tinderbox test failed again with the following error in StressTestMulti: 2010-06-21 16:02:44.542 GMT Thread [DRDAConnThread_1289,5,derby.daemons] (XID = 1929067), (SESSIONID = 61), (DATABASE = wombat), (DRDAID = NF000001.P530-869193450946392963 {30} ), Failed Statement is: null java.lang.NullPointerException at org.apache.derby.impl.jdbc.EmbedPreparedStatement.getMetaData(Unknown Source) at org.apache.derby.impl.drda.DRDAConnThread.writeSQLDARD(Unknown Source) at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source) at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source) It looks suspicious that after the change from returning void to returning a PreparedStatement, the return value is ignored in EmbedPreparedStatement.getMetaData. Is this correct? As for the error itself, it has already been reported ( DERBY-3823 ).
          Hide
          Knut Anders Hatlen added a comment -

          Perhaps we should try to convert the test case Jeff provided to JUnit and add it to the regression test suite before we close the issue?

          Show
          Knut Anders Hatlen added a comment - Perhaps we should try to convert the test case Jeff provided to JUnit and add it to the regression test suite before we close the issue?
          Hide
          Knut Anders Hatlen added a comment -

          All the regression tests ran cleanly in my environment. Committed revision 956504.

          Show
          Knut Anders Hatlen added a comment - All the regression tests ran cleanly in my environment. Committed revision 956504.
          Hide
          Bryan Pendleton added a comment -

          +1. The latest version of the patch looks good to me. Thanks Brett for working on this problem.

          Show
          Bryan Pendleton added a comment - +1. The latest version of the patch looks good to me. Thanks Brett for working on this problem.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks Brett! The patch looks good to me. I'll run the regression tests and intend to commit it if they pass.

          Show
          Knut Anders Hatlen added a comment - Thanks Brett! The patch looks good to me. I'll run the regression tests and intend to commit it if they pass.
          Hide
          Brett Wooldridge added a comment -

          Thanks for the overview, it was helpful. I have reverted the synchronization to it's former state.

          The issue I was seeing with duplicate primary keys turned out to be a Hibernate bug, not related to this patch. So, I am comfortable going forward with this patch.

          Thanks all for the review.

          Show
          Brett Wooldridge added a comment - Thanks for the overview, it was helpful. I have reverted the synchronization to it's former state. The issue I was seeing with duplicate primary keys turned out to be a Hibernate bug, not related to this patch. So, I am comfortable going forward with this patch. Thanks all for the review.
          Hide
          Knut Anders Hatlen added a comment -

          > The original still had a synchronized block that synchronized on the
          > prepared statement. Was it unnecessary? I'm not so sure...

          The original synchronization block enclosed code that accessed the
          prepared statement, which may be shared between threads. So that
          synchronization should not be removed, I was only talking about the
          synchronization blocks that were added by the patch.

          > One concern is the statement above "GenericActivationHolder instances
          > are private to a transaction". Private to a transaction, and private
          > to a thread are two different things. We run in an XA environment, and
          > theoretically (as far as I understand XA), a transaction can span both
          > connections and threads.

          In this case, "private to a transaction" meant "private to a local
          Derby transaction", so XA won't come into the picture. Still, it's
          true that a single transaction can run in multiple threads. For
          example, I can prepare a statement in one thread, execute the
          statement in another thread, and commit the transaction in a third
          thread. However, there's synchronization at a higher level that
          prevents multiple threads from running in the same transaction
          concurrently. This is ensured at the entry points in EmbedConnection,
          EmbedStatement and EmbedResultSet. The engine code therefore safely
          assumes that "private to a transaction" implies "not accessed by
          multiple threads concurrently".

          > Another issue, is that, when running with the above patch I am seeing
          > intermittent failures in our server application. There are multiple
          > threads doing "DELETE FROM" and "INSERT INTO" the same table, and I am
          > seeing an occasional insert failure stating that the insert would
          > cause a primary key violation.
          >
          > At this point, I am not sure it is related to the patch or not because
          > our server code has been shifting about as well. However, a short run
          > using 10.6.1.0 without the patch did not see this particular
          > error. Because the issue is intermittent, and I unsure whether the
          > cause is the patch or application level code. Further testing tomorrow
          > should clarify.

          Thanks for investigating this further. It would be good to understand
          why you see this error before we proceed with getting the patch into
          the code tree.

          > Note that running the derby test suite did not result in any errors.

          Yes, the regression test suite doesn't have many multi-threaded tests
          and isn't good at catching such errors, unfortunately...

          Show
          Knut Anders Hatlen added a comment - > The original still had a synchronized block that synchronized on the > prepared statement. Was it unnecessary? I'm not so sure... The original synchronization block enclosed code that accessed the prepared statement, which may be shared between threads. So that synchronization should not be removed, I was only talking about the synchronization blocks that were added by the patch. > One concern is the statement above "GenericActivationHolder instances > are private to a transaction". Private to a transaction, and private > to a thread are two different things. We run in an XA environment, and > theoretically (as far as I understand XA), a transaction can span both > connections and threads. In this case, "private to a transaction" meant "private to a local Derby transaction", so XA won't come into the picture. Still, it's true that a single transaction can run in multiple threads. For example, I can prepare a statement in one thread, execute the statement in another thread, and commit the transaction in a third thread. However, there's synchronization at a higher level that prevents multiple threads from running in the same transaction concurrently. This is ensured at the entry points in EmbedConnection, EmbedStatement and EmbedResultSet. The engine code therefore safely assumes that "private to a transaction" implies "not accessed by multiple threads concurrently". > Another issue, is that, when running with the above patch I am seeing > intermittent failures in our server application. There are multiple > threads doing "DELETE FROM" and "INSERT INTO" the same table, and I am > seeing an occasional insert failure stating that the insert would > cause a primary key violation. > > At this point, I am not sure it is related to the patch or not because > our server code has been shifting about as well. However, a short run > using 10.6.1.0 without the patch did not see this particular > error. Because the issue is intermittent, and I unsure whether the > cause is the patch or application level code. Further testing tomorrow > should clarify. Thanks for investigating this further. It would be good to understand why you see this error before we proceed with getting the patch into the code tree. > Note that running the derby test suite did not result in any errors. Yes, the regression test suite doesn't have many multi-threaded tests and isn't good at catching such errors, unfortunately...
          Hide
          Brett Wooldridge added a comment -

          The original still had a synchronized block that synchronized on the prepared statement. Was it unnecessary? I'm not so sure...

          One concern is the statement above "GenericActivationHolder instances are private to a transaction". Private to a transaction, and private to a thread are two different things. We run in an XA environment, and theoretically (as far as I understand XA), a transaction can span both connections and threads.

          Another issue, is that, when running with the above patch I am seeing intermittent failures in our server application. There are multiple threads doing "DELETE FROM" and "INSERT INTO" the same table, and I am seeing an occasional insert failure stating that the insert would cause a primary key violation.

          At this point, I am not sure it is related to the patch or not because our server code has been shifting about as well. However, a short run using 10.6.1.0 without the patch did not see this particular error. Because the issue is intermittent, and I unsure whether the cause is the patch or application level code. Further testing tomorrow should clarify.

          Note that running the derby test suite did not result in any errors.

          Show
          Brett Wooldridge added a comment - The original still had a synchronized block that synchronized on the prepared statement. Was it unnecessary? I'm not so sure... One concern is the statement above "GenericActivationHolder instances are private to a transaction". Private to a transaction, and private to a thread are two different things. We run in an XA environment, and theoretically (as far as I understand XA), a transaction can span both connections and threads. Another issue, is that, when running with the above patch I am seeing intermittent failures in our server application. There are multiple threads doing "DELETE FROM" and "INSERT INTO" the same table, and I am seeing an occasional insert failure stating that the insert would cause a primary key violation. At this point, I am not sure it is related to the patch or not because our server code has been shifting about as well. However, a short run using 10.6.1.0 without the patch did not see this particular error. Because the issue is intermittent, and I unsure whether the cause is the patch or application level code. Further testing tomorrow should clarify. Note that running the derby test suite did not result in any errors.
          Hide
          Knut Anders Hatlen added a comment -

          The new synchronized blocks in execute() still only access fields local to GenericActivationHolder, so I'm not sure I understand the need to synchronize on the prepared statement there. I think it would be fine to use the second revision of the patch (the one that fixed the NPE) and change it so that it didn't uncomment the outer synchronization block.

          Show
          Knut Anders Hatlen added a comment - The new synchronized blocks in execute() still only access fields local to GenericActivationHolder, so I'm not sure I understand the need to synchronize on the prepared statement there. I think it would be fine to use the second revision of the patch (the one that fixed the NPE) and change it so that it didn't uncomment the outer synchronization block.
          Hide
          Brett Wooldridge added a comment -

          Patch with updated synchronization.

          Show
          Brett Wooldridge added a comment - Patch with updated synchronization.
          Hide
          Brett Wooldridge added a comment -

          Knut, thanks for the code review. I have updated the synchronization to synchronize on the ExecPreparedStatement (ps) object thoughout the execute() method. If you have time, please review the new uploaded patch.

          Show
          Brett Wooldridge added a comment - Knut, thanks for the code review. I have updated the synchronization to synchronize on the ExecPreparedStatement (ps) object thoughout the execute() method. If you have time, please review the new uploaded patch.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for digging up this information, Mamta.

          Brett: The way I understand this code, GenericActivationHolder instances are private to a transaction, and therefore only accessed by a single thread (enforced by synchronization in EmbedStatement.execute()). So I think the synchronization on "this" added in the latest patch won't have any effect.

          Show
          Knut Anders Hatlen added a comment - Thanks for digging up this information, Mamta. Brett: The way I understand this code, GenericActivationHolder instances are private to a transaction, and therefore only accessed by a single thread (enforced by synchronization in EmbedStatement.execute()). So I think the synchronization on "this" added in the latest patch won't have any effect.
          Hide
          Brett Wooldridge added a comment -

          Updated patch with synchronization changes in GenericActivationHolder.

          Show
          Brett Wooldridge added a comment - Updated patch with synchronization changes in GenericActivationHolder.
          Hide
          Brett Wooldridge added a comment -

          Attached is a new patch with narrowed synchronization in GenericActivationHolder. Still, the synchronization is not the same as previously because I think that synchronization was wrong.

          The previous code had this logic OUTSIDE of synchronized blocks:

          284 newGC = gc;
          296 BaseActivation newAC = (BaseActivation) newGC.newInstance(lcc);
          298 DataTypeDescriptor[] newParamTypes = ps.getParameterTypes();

          then later:
          333 ac = newAC;
          334 gc = newGC;
          335 paramTypes = newParamTypes;
          ...
          352 return ac.execute(); <-- Particularly dangerous

          These unprotected accesses (both read and write) to member variables shared by multiple threads is inherently dangerous.

          The new patch for GenericActivationHolder takes the approach of copying the member variables into local variables within a synchronized block, and then using those locals throughout. At the end of the code path where a statement is re-prepared (and a new generated class, activation class, and parameter types created), the member variables are then updated within a synchronized block.

          I believe this approach should eliminate all synchronization issues within the execute() method while minimizing the synchronization windows.

          Show
          Brett Wooldridge added a comment - Attached is a new patch with narrowed synchronization in GenericActivationHolder. Still, the synchronization is not the same as previously because I think that synchronization was wrong. The previous code had this logic OUTSIDE of synchronized blocks: 284 newGC = gc; 296 BaseActivation newAC = (BaseActivation) newGC.newInstance(lcc); 298 DataTypeDescriptor[] newParamTypes = ps.getParameterTypes(); then later: 333 ac = newAC; 334 gc = newGC; 335 paramTypes = newParamTypes; ... 352 return ac.execute(); <-- Particularly dangerous These unprotected accesses (both read and write) to member variables shared by multiple threads is inherently dangerous. The new patch for GenericActivationHolder takes the approach of copying the member variables into local variables within a synchronized block, and then using those locals throughout. At the end of the code path where a statement is re-prepared (and a new generated class, activation class, and parameter types created), the member variables are then updated within a synchronized block. I believe this approach should eliminate all synchronization issues within the execute() method while minimizing the synchronization windows.
          Hide
          Mamta A. Satoor added a comment -

          Attaching the stack trace for the monitor deadlock from a stress test which was seen with Cloudscape release and as a result, the synchornization code was commented.

          Show
          Mamta A. Satoor added a comment - Attaching the stack trace for the monitor deadlock from a stress test which was seen with Cloudscape release and as a result, the synchornization code was commented.
          Hide
          Mamta A. Satoor added a comment -

          The synchronization was commented out before the code was open sourced. The change was part of a fix for a java monitor deadlock while running a stress test. Unfortunately the stress test is no longer available and will not run against the derby codeline.

          I will attach the original stack trace, but the code has changed quite a bit so the routine names, locations, will have changed. About the only thing relevant will be the monitor deadlock info.

          Based on this, I do not think we should just go ahead and put the synchronization back into GenericActivationHolder.execute().

          Does the new fix add any additional need for synchonization?

          Show
          Mamta A. Satoor added a comment - The synchronization was commented out before the code was open sourced. The change was part of a fix for a java monitor deadlock while running a stress test. Unfortunately the stress test is no longer available and will not run against the derby codeline. I will attach the original stack trace, but the code has changed quite a bit so the routine names, locations, will have changed. About the only thing relevant will be the monitor deadlock info. Based on this, I do not think we should just go ahead and put the synchronization back into GenericActivationHolder.execute(). Does the new fix add any additional need for synchonization?
          Hide
          Knut Anders Hatlen added a comment -

          There's a smaller synchronized block within the block where the synchronization is commented out (both synchronized on ps). The inner synchronization was added to address DERBY-3260. The question was raised at that time too, why the synchronization had been commented out in the first place. Since we didn't know why, we went for the minimum needed to fix that bug. Unless there's some change in the patch that makes it necessary to synchronize on the entire block, I'd be more comfortable with leaving the synchronization as it is for now. It's not that I'm 100% convinced the current approach is correct, but if we find problems there, we could address them in a separate issue.

          As to multi-threaded access to GenericActivationHolders, I don't think that will happen. I believe GenericActivationHolder instances are private to a transaction, and concurrent access will be prevented by synchronization at a higher level. The instance stored in the ps field can however be shared by multiple GenericActivationHolders, and the synchronization is there to give a consistent view of that object's state.

          Another thing that makes me think we should be careful with adding synchronization here, is that it adds to the problem reported in DERBY-3024. I did a few runs on a 32-core machine with the test client attached there, and saw a 10-30% performance degradation for various multi-threaded configurations when the patch was applied.

          Show
          Knut Anders Hatlen added a comment - There's a smaller synchronized block within the block where the synchronization is commented out (both synchronized on ps). The inner synchronization was added to address DERBY-3260 . The question was raised at that time too, why the synchronization had been commented out in the first place. Since we didn't know why, we went for the minimum needed to fix that bug. Unless there's some change in the patch that makes it necessary to synchronize on the entire block, I'd be more comfortable with leaving the synchronization as it is for now. It's not that I'm 100% convinced the current approach is correct, but if we find problems there, we could address them in a separate issue. As to multi-threaded access to GenericActivationHolders, I don't think that will happen. I believe GenericActivationHolder instances are private to a transaction, and concurrent access will be prevented by synchronization at a higher level. The instance stored in the ps field can however be shared by multiple GenericActivationHolders, and the synchronization is there to give a consistent view of that object's state. Another thing that makes me think we should be careful with adding synchronization here, is that it adds to the problem reported in DERBY-3024 . I did a few runs on a 32-core machine with the test client attached there, and saw a 10-30% performance degradation for various multi-threaded configurations when the patch was applied.
          Hide
          Bryan Pendleton added a comment -

          +1 to the strategy of the patch. I think the approach of allowing each caller to prepare
          the statement independently is valid and useful. In this age of multi-core computing,
          I think doing possibly extra work like this in search of higher concurrency (and fewer
          deadlocks) is a good technique.

          I, too, was worried about the patch's modifications to the synchronized() blocks, so
          I'd like to see that discussion resolved prior to commit.

          Show
          Bryan Pendleton added a comment - +1 to the strategy of the patch. I think the approach of allowing each caller to prepare the statement independently is valid and useful. In this age of multi-core computing, I think doing possibly extra work like this in search of higher concurrency (and fewer deadlocks) is a good technique. I, too, was worried about the patch's modifications to the synchronized() blocks, so I'd like to see that discussion resolved prior to commit.
          Hide
          Brett Wooldridge added a comment -

          Hi Knut, thanks for taking a look.

          1) The synchronized block in GenericActivationHolder.execute() was actually completely commented out in the original – but the comment regarding the synchronized block remained. I'm actually unsure if there is indeed multi-threaded access to GenericActivationHolder – but I believe there can be. Because that method replaces several member variables, it seems unsafe to do so outside of the context of a synchronized block. So, basically I just restored the synchronized block. If it can be shown that there is no multi-threaded access – for example because of a lock higher up the call chain, we can certainly remove it. But being unfamiliar with the Derby codebase in general, I decided to err on the side of caution.

          2) My thinking on removing the statement from the cache was that the new statement is an equal candidate to be cached, and because it was just recompiled the one that does exist in the cache is stale in a sense.

          3) I thought that too. I think it would indeed be safe to remove to calls to notifyAll(), at least in the case of GenericStatement.

          Show
          Brett Wooldridge added a comment - Hi Knut, thanks for taking a look. 1) The synchronized block in GenericActivationHolder.execute() was actually completely commented out in the original – but the comment regarding the synchronized block remained. I'm actually unsure if there is indeed multi-threaded access to GenericActivationHolder – but I believe there can be. Because that method replaces several member variables, it seems unsafe to do so outside of the context of a synchronized block. So, basically I just restored the synchronized block. If it can be shown that there is no multi-threaded access – for example because of a lock higher up the call chain, we can certainly remove it. But being unfamiliar with the Derby codebase in general, I decided to err on the side of caution. 2) My thinking on removing the statement from the cache was that the new statement is an equal candidate to be cached, and because it was just recompiled the one that does exist in the cache is stale in a sense. 3) I thought that too. I think it would indeed be safe to remove to calls to notifyAll(), at least in the case of GenericStatement.
          Hide
          Knut Anders Hatlen added a comment -

          Hi Brett,

          Thanks for contributing the patch. I think creating a new statement instead of waiting sounds like a reasonable approach to solve this bug.

          I have a couple of questions about the patch:

          1) The synchronized block in GenericActivationHolder.execute() is expanded. What's the reasoning behind this change?

          2) In GenericStatement, I noticed that the old code that generates a new statement if the session schema is references, sets foundInCache to false to prevent the call to removeStatement() in catch blocks further down. Should we do the same in the new code that checks the value of compilingStatement? I'm not sure if calling removeStatement() on a statement that's not in the cache does any harm, but I thought I'd mention it in any case.

          3) Since the patch removes the call to wait() on the prepared statement (the only call to wait() as far as I can tell), do you think it would be safe to also remove the calls to notifyAll() in GenericStatement and GenericPreparedStatement?

          Show
          Knut Anders Hatlen added a comment - Hi Brett, Thanks for contributing the patch. I think creating a new statement instead of waiting sounds like a reasonable approach to solve this bug. I have a couple of questions about the patch: 1) The synchronized block in GenericActivationHolder.execute() is expanded. What's the reasoning behind this change? 2) In GenericStatement, I noticed that the old code that generates a new statement if the session schema is references, sets foundInCache to false to prevent the call to removeStatement() in catch blocks further down. Should we do the same in the new code that checks the value of compilingStatement? I'm not sure if calling removeStatement() on a statement that's not in the cache does any harm, but I thought I'd mention it in any case. 3) Since the patch removes the call to wait() on the prepared statement (the only call to wait() as far as I can tell), do you think it would be safe to also remove the calls to notifyAll() in GenericStatement and GenericPreparedStatement?
          Hide
          Brett Wooldridge added a comment -

          Updated patch file (10.6 base)

          Show
          Brett Wooldridge added a comment - Updated patch file (10.6 base)
          Hide
          Brett Wooldridge added a comment -

          Thanks for the feedback. Typo in my patch. In GenericActivationHolder.java I had:

          newPS = (ExecPreparedStatement) ps.rePrepare...
          newGC = ps.getActivationClass();

          When I meant:

          newPS = (ExecPreparedStatement) ps.rePrepare...
          newGC = newPS.getActivationClass();

          I've run the multi stress suite, and it now passes without error. I have updated the patch.

          Show
          Brett Wooldridge added a comment - Thanks for the feedback. Typo in my patch. In GenericActivationHolder.java I had: newPS = (ExecPreparedStatement) ps.rePrepare... newGC = ps.getActivationClass(); When I meant: newPS = (ExecPreparedStatement) ps.rePrepare... newGC = newPS.getActivationClass(); I've run the multi stress suite, and it now passes without error. I have updated the patch.
          Hide
          Kristian Waagan added a comment -

          Hi Brett,

          I had a quick look at your patch, applied it and ran it through suites.All. I see the following error(s):

          testStressMulti(org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest) FAILURE:
          junit.framework.AssertionFailedError: Caused by:
          java.sql.SQLException: Java exception: ': java.lang.NullPointerException'.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95)
          at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:142)
          at org.apache.derby.impl.jdbc.Util.javaException(Util.java:299)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2269)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1321)
          at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:625)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeQuery(EmbedStatement.java:152)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.select(StressMultiTest.java:555)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:429)
          at java.lang.Thread.run(Thread.java:619)
          Caused by: java.sql.SQLException: Java exception: ': java.lang.NullPointerException'.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70)
          ... 12 more
          Caused by: java.lang.NullPointerException
          at org.apache.derby.impl.sql.GenericActivationHolder.execute(GenericActivationHolder.java:298)
          at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:437)
          at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:320)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1232)
          ... 5 more

          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.handleException(StressMultiTest.java:351)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.access$200(StressMultiTest.java:70)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:445)
          at java.lang.Thread.run(Thread.java:619)

          testStressMulti(org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest) FAILURE:
          junit.framework.AssertionFailedError: Caused by:
          java.sql.SQLException: Java exception: ': java.lang.NullPointerException'.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95)
          at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:142)
          at org.apache.derby.impl.jdbc.Util.javaException(Util.java:299)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2269)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1321)
          at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:625)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeQuery(EmbedStatement.java:152)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.select(StressMultiTest.java:555)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:429)
          at java.lang.Thread.run(Thread.java:619)
          Caused by: java.sql.SQLException: Java exception: ': java.lang.NullPointerException'.
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70)
          ... 12 more
          Caused by: java.lang.NullPointerException
          at org.apache.derby.impl.sql.GenericActivationHolder.execute(GenericActivationHolder.java:298)
          at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:437)
          at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:320)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1232)
          ... 5 more

          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.handleException(StressMultiTest.java:351)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.access$200(StressMultiTest.java:70)
          at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:445)
          at java.lang.Thread.run(Thread.java:619)

          I haven't had the time to investigate this further, others should definitely do so if they think they can contribute
          A first step would be to verify this error (running the multi.StressMultiTest).

          Show
          Kristian Waagan added a comment - Hi Brett, I had a quick look at your patch, applied it and ran it through suites.All. I see the following error(s): testStressMulti(org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest) FAILURE: junit.framework.AssertionFailedError: Caused by: java.sql.SQLException: Java exception: ': java.lang.NullPointerException'. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:142) at org.apache.derby.impl.jdbc.Util.javaException(Util.java:299) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2269) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1321) at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:625) at org.apache.derby.impl.jdbc.EmbedStatement.executeQuery(EmbedStatement.java:152) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.select(StressMultiTest.java:555) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:429) at java.lang.Thread.run(Thread.java:619) Caused by: java.sql.SQLException: Java exception: ': java.lang.NullPointerException'. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70) ... 12 more Caused by: java.lang.NullPointerException at org.apache.derby.impl.sql.GenericActivationHolder.execute(GenericActivationHolder.java:298) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:437) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:320) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1232) ... 5 more at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.handleException(StressMultiTest.java:351) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.access$200(StressMultiTest.java:70) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:445) at java.lang.Thread.run(Thread.java:619) testStressMulti(org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest) FAILURE: junit.framework.AssertionFailedError: Caused by: java.sql.SQLException: Java exception: ': java.lang.NullPointerException'. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:95) at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:142) at org.apache.derby.impl.jdbc.Util.javaException(Util.java:299) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:403) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:346) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2269) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:81) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1321) at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:625) at org.apache.derby.impl.jdbc.EmbedStatement.executeQuery(EmbedStatement.java:152) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.select(StressMultiTest.java:555) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:429) at java.lang.Thread.run(Thread.java:619) Caused by: java.sql.SQLException: Java exception: ': java.lang.NullPointerException'. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(SQLExceptionFactory40.java:119) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:70) ... 12 more Caused by: java.lang.NullPointerException at org.apache.derby.impl.sql.GenericActivationHolder.execute(GenericActivationHolder.java:298) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:437) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:320) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1232) ... 5 more at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.handleException(StressMultiTest.java:351) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest.access$200(StressMultiTest.java:70) at org.apache.derbyTesting.functionTests.tests.multi.StressMultiTest$StressMultiRunnable.run(StressMultiTest.java:445) at java.lang.Thread.run(Thread.java:619) I haven't had the time to investigate this further, others should definitely do so if they think they can contribute A first step would be to verify this error (running the multi.StressMultiTest).
          Hide
          Brett Wooldridge added a comment -

          Just a further comment. We see this in a client/server setup. The original bug mentions the Embedded driver, but because this code in common between the network server and the embedded driver, the issue is the same.

          Show
          Brett Wooldridge added a comment - Just a further comment. We see this in a client/server setup. The original bug mentions the Embedded driver, but because this code in common between the network server and the embedded driver, the issue is the same.
          Hide
          Brett Wooldridge added a comment -

          Proposed patch for 4279.

          Show
          Brett Wooldridge added a comment - Proposed patch for 4279.
          Hide
          Brett Wooldridge added a comment -

          This defect is a major issue for my company and product. I am submitting a patch to address this defect. The attached patch is against the 10.6 branch, but I would expect it to equally apply to the trunk. The description in the initial report by Jeff Stuckman is accurate, please read it before studying the patch.

          Let me explain the patch, and the rationale behind it. Some familiarity with the code in question is assumed. Four files where touched:

          org.apache.derby.iapi.sql.PreparedStatement
          org.apache.derby.impl.sql.GenericStatement
          org.apache.derby.impl.sql.GenericActivationHolder
          org.apache.derby.impl.sql.GenericPreparedStatement

          Basically the previous code did this:

          Block Thread B if it tries to recompile a statement that is being recompiled by Thread A

          This would result in deadlock if Thread A needs the exclusive (DB) lock held by thread B.

          Basically the new code does this:

          When Thread B tries to recompile a statement AND that statement is currently being recompiled (by another thread), don't block, simply create a new GenericPreparedStatement and compile that instead. In essence, this is treated like a "cache miss".

          At the GenericStatement level, there is no way to know what Storage level locks are being held. Therefore, blocking Thread B (via a Java lock) while awaiting notification by a thread (Thread A) that might itself be waiting on a Storage level lock held by Thread B is bound to lead to deadlocks in this, and possibly other, scenarios.

          This patch foregoes some [extremely] minor cache efficiencies by treating concurrent recompilation of a statement as a cache miss. Recompilation by a single thread is not treated as a cache miss, and the code flow is unchanged in that case. Unless a statement undergoes constant concurrent recompilation (defeating the statement cache anyway), I would expect this change to have an almost immeasurable performance impact on applications. Certainly compared to the workaround of disabling the statement cache, well, there is no comparison.

          When studying this patch, I recommend reading in this order for clarity:

          1. PreparedStatement
          2. GenericPreparedStatement
          3. GenericActivationHolder
          4. GenericStatement

          PreparedStatement:
          The interface was changed so that rePrepare() might return a new statement.

          GenericPreparedStatement:
          In the "rePrepare()" implementation, account for the fact that a new statement may be returned.

          GenericActivationHolder:
          In execute(), when a PreparedStatement is recognized as invalid and therefore rePrepare()'d, accept the re-prepared statement and use it. 99% of the time, the returned statement will be the same statement as the original. But in the case of a concurrent re-preparation, a new statement will be returned.

          GenericStatement:
          in prepMinion(), when we realize were in a "concurrent re-compile" situation, rather than block waiting for the "other thread" to complete recompiling the cached statement, simply create a new GenericPreparedStatement() and compile it instead. This is where the rubber meets the road, and the deadlock is avoided.

          I ran the 'derbyall' test suite, and while I did receive two failures, they seem unrelated to this fix and have more to do with my environment. I would appreciate someone reviewing the patch, applying it to their system (with a known passing test suite), and running it through a regression test pass.

          Show
          Brett Wooldridge added a comment - This defect is a major issue for my company and product. I am submitting a patch to address this defect. The attached patch is against the 10.6 branch, but I would expect it to equally apply to the trunk. The description in the initial report by Jeff Stuckman is accurate, please read it before studying the patch. Let me explain the patch, and the rationale behind it. Some familiarity with the code in question is assumed. Four files where touched: org.apache.derby.iapi.sql.PreparedStatement org.apache.derby.impl.sql.GenericStatement org.apache.derby.impl.sql.GenericActivationHolder org.apache.derby.impl.sql.GenericPreparedStatement Basically the previous code did this: Block Thread B if it tries to recompile a statement that is being recompiled by Thread A This would result in deadlock if Thread A needs the exclusive (DB) lock held by thread B. Basically the new code does this: When Thread B tries to recompile a statement AND that statement is currently being recompiled (by another thread), don't block, simply create a new GenericPreparedStatement and compile that instead. In essence, this is treated like a "cache miss". At the GenericStatement level, there is no way to know what Storage level locks are being held. Therefore, blocking Thread B (via a Java lock) while awaiting notification by a thread (Thread A) that might itself be waiting on a Storage level lock held by Thread B is bound to lead to deadlocks in this, and possibly other, scenarios. This patch foregoes some [extremely] minor cache efficiencies by treating concurrent recompilation of a statement as a cache miss. Recompilation by a single thread is not treated as a cache miss, and the code flow is unchanged in that case. Unless a statement undergoes constant concurrent recompilation (defeating the statement cache anyway), I would expect this change to have an almost immeasurable performance impact on applications. Certainly compared to the workaround of disabling the statement cache, well, there is no comparison. When studying this patch, I recommend reading in this order for clarity: 1. PreparedStatement 2. GenericPreparedStatement 3. GenericActivationHolder 4. GenericStatement PreparedStatement: The interface was changed so that rePrepare() might return a new statement. GenericPreparedStatement: In the "rePrepare()" implementation, account for the fact that a new statement may be returned. GenericActivationHolder: In execute(), when a PreparedStatement is recognized as invalid and therefore rePrepare()'d, accept the re-prepared statement and use it. 99% of the time, the returned statement will be the same statement as the original. But in the case of a concurrent re-preparation, a new statement will be returned. GenericStatement: in prepMinion(), when we realize were in a "concurrent re-compile" situation, rather than block waiting for the "other thread" to complete recompiling the cached statement, simply create a new GenericPreparedStatement() and compile it instead. This is where the rubber meets the road, and the deadlock is avoided. I ran the 'derbyall' test suite, and while I did receive two failures, they seem unrelated to this fix and have more to do with my environment. I would appreciate someone reviewing the patch, applying it to their system (with a known passing test suite), and running it through a regression test pass.
          Hide
          Dag H. Wanvik added a comment -

          Seen back to 10.0.

          Show
          Dag H. Wanvik added a comment - Seen back to 10.0.
          Hide
          Dag H. Wanvik added a comment -

          Triaged for 10.5.2, checking "repro attached", setting "normal" urgency.

          Show
          Dag H. Wanvik added a comment - Triaged for 10.5.2, checking "repro attached", setting "normal" urgency.
          Hide
          Jeff Stuckman added a comment -

          One workaround is to set derby.language.statementCacheSize=0. However, the impact of setting this property may be unclear to users because it is not documented. (DERBY-4280)

          Show
          Jeff Stuckman added a comment - One workaround is to set derby.language.statementCacheSize=0. However, the impact of setting this property may be unclear to users because it is not documented. ( DERBY-4280 )
          Hide
          Jeff Stuckman added a comment -

          Test case

          Show
          Jeff Stuckman added a comment - Test case
          Hide
          Jeff Stuckman added a comment -

          Output from test case:

          Invalidating the prepared statements...
          Table locked.
          Executing query 1. Query should block because other thread has lock.
          Executing query 2. Query should not block because this thread already has the necessary locks.
          java.sql.SQLTransactionRollbackException: A lock could not be obtained within the time requested
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
          at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source)
          at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(Unknown Source)
          at Derby4279$Thread2.run(Derby4279.java:81)
          at java.lang.Thread.run(Thread.java:619)
          Caused by: java.sql.SQLException: A lock could not be obtained within the time requested
          at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
          at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
          ... 11 more
          Caused by: ERROR 40XL1: A lock could not be obtained within the time requested
          at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
          at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(Unknown Source)
          at org.apache.derby.impl.services.locks.AbstractPool.lockObject(Unknown Source)
          at org.apache.derby.impl.services.locks.ConcurrentPool.lockObject(Unknown Source)
          at org.apache.derby.impl.store.raw.xact.RowLocking2.lockContainer(Unknown Source)
          at org.apache.derby.impl.store.raw.data.BaseContainerHandle.useContainer(Unknown Source)
          at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown Source)
          at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown Source)
          at org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown Source)
          at org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown Source)
          at org.apache.derby.impl.store.access.heap.Heap.open(Unknown Source)
          at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown Source)
          at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown Source)
          at org.apache.derby.impl.sql.compile.ResultColumnList.generateHolderMethod(Unknown Source)
          at org.apache.derby.impl.sql.compile.FromBaseTable.getScanArguments(Unknown Source)
          at org.apache.derby.impl.sql.compile.FromBaseTable.generateResultSet(Unknown Source)
          at org.apache.derby.impl.sql.compile.FromBaseTable.generate(Unknown Source)
          at org.apache.derby.impl.sql.compile.IndexToBaseRowNode.generate(Unknown Source)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(Unknown Source)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(Unknown Source)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(Unknown Source)
          at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(Unknown Source)
          at org.apache.derby.impl.sql.compile.ScrollInsensitiveResultSetNode.generate(Unknown Source)
          at org.apache.derby.impl.sql.compile.CursorNode.generate(Unknown Source)
          at org.apache.derby.impl.sql.compile.StatementNode.generate(Unknown Source)
          at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
          at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
          at org.apache.derby.impl.sql.GenericPreparedStatement.rePrepare(Unknown Source)
          ... 5 more
          Query 2 finished.

          Show
          Jeff Stuckman added a comment - Output from test case: Invalidating the prepared statements... Table locked. Executing query 1. Query should block because other thread has lock. Executing query 2. Query should not block because this thread already has the necessary locks. java.sql.SQLTransactionRollbackException: A lock could not be obtained within the time requested at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(Unknown Source) at Derby4279$Thread2.run(Derby4279.java:81) at java.lang.Thread.run(Thread.java:619) Caused by: java.sql.SQLException: A lock could not be obtained within the time requested at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 11 more Caused by: ERROR 40XL1: A lock could not be obtained within the time requested at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(Unknown Source) at org.apache.derby.impl.services.locks.AbstractPool.lockObject(Unknown Source) at org.apache.derby.impl.services.locks.ConcurrentPool.lockObject(Unknown Source) at org.apache.derby.impl.store.raw.xact.RowLocking2.lockContainer(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseContainerHandle.useContainer(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown Source) at org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown Source) at org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown Source) at org.apache.derby.impl.store.access.heap.Heap.open(Unknown Source) at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown Source) at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown Source) at org.apache.derby.impl.sql.compile.ResultColumnList.generateHolderMethod(Unknown Source) at org.apache.derby.impl.sql.compile.FromBaseTable.getScanArguments(Unknown Source) at org.apache.derby.impl.sql.compile.FromBaseTable.generateResultSet(Unknown Source) at org.apache.derby.impl.sql.compile.FromBaseTable.generate(Unknown Source) at org.apache.derby.impl.sql.compile.IndexToBaseRowNode.generate(Unknown Source) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(Unknown Source) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(Unknown Source) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generateMinion(Unknown Source) at org.apache.derby.impl.sql.compile.ProjectRestrictNode.generate(Unknown Source) at org.apache.derby.impl.sql.compile.ScrollInsensitiveResultSetNode.generate(Unknown Source) at org.apache.derby.impl.sql.compile.CursorNode.generate(Unknown Source) at org.apache.derby.impl.sql.compile.StatementNode.generate(Unknown Source) at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source) at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source) at org.apache.derby.impl.sql.GenericPreparedStatement.rePrepare(Unknown Source) ... 5 more Query 2 finished.

            People

            • Assignee:
              Brett Wooldridge
              Reporter:
              Jeff Stuckman
            • Votes:
              7 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development