Derby
  1. Derby
  2. DERBY-5624

System can run out of stack space while processing DropOnCommit requests.

    Details

    • Urgency:
      Urgent
    • Issue & fix info:
      High Value Fix, Patch Available, Repro attached

      Description

      The system currently recursively calls xact.notifyObservers() from DropOnCommit.update(). It does this because in some cases
      new observers can be added while processing the list of notifyObservers and those were being missed before the change, causing
      Assertions in the tests and possibly files not properly dropped on commit.

      Multiple users on the Derby user list have had failures running SYSCS_UTIL.SYSCS_COMPRESS_TABLE(), running out of stack track
      with a heavily recursive stack trace of the form (see more detail from these reports in subsequent comments):
      Caused by: java.lang.StackOverflowError
      at java.lang.ThreadLocal.get(ThreadLocal.java:125)
      at java.lang.StringCoding.deref(StringCoding.java:46)
      at java.lang.StringCoding.encode(StringCoding.java:258)
      at java.lang.String.getBytes(String.java:946)
      at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
      at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
      at java.io.File.exists(File.java:733)
      at org.apache.derby.impl.store.raw.data.StreamFileContainer.run(Unknown Source)
      at java.security.AccessController.doPrivileged(Native Method)
      at org.apache.derby.impl.store.raw.data.StreamFileContainer.privExists(Unknown Source)
      at org.apache.derby.impl.store.raw.data.StreamFileContainer.open(Unknown Source)
      at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openStreamContainer(Unknown Source)
      at org.apache.derby.impl.store.raw.xact.Xact.openStreamContainer(Unknown Source)
      at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.dropStreamContainer(Unknown Source)
      at org.apache.derby.impl.store.raw.xact.Xact.dropStreamContainer(Unknown Source)
      at org.apache.derby.impl.store.raw.data.DropOnCommit.update(Unknown Source)
      at java.util.Observable.notifyObservers(Observable.java:142)
      at org.apache.derby.iapi.store.raw.xact.RawTransaction.notifyObservers(Unknown Source)
      at org.apache.derby.impl.store.raw.data.DropOnCommit.update(Unknown Source)
      at java.util.Observable.notifyObservers(Observable.java:142)
      at org.apache.derby.iapi.store.raw.xact.RawTransaction.notifyObservers(Unknown Source)
      at org.apache.derby.impl.store.raw.data.DropOnCommit.update(Unknown Source)
      at java.util.Observable.notifyObservers(Observable.java:142)
      at org.apache.derby.iapi.store.raw.xact.RawTransaction.notifyObservers(Unknown Source)
      at org.apache.derby.impl.store.raw.data.DropOnCommit.update(Unknown Source)
      at java.util.Observable.notifyObservers(Observable.java:142)

      1. DERBY-5624.patch
        2 kB
        Mike Matrigali
      2. error-stacktrace_mod.out
        10 kB
        Myrna van Lunteren
      3. largedatasuite_mod.out
        24 kB
        Myrna van Lunteren

        Activity

        Hide
        Mike Matrigali added a comment -

        initial proof on concept patch, no extensive tests run yet. Passes store test for original 3993 error. The included test
        failed before the fix and passes after the fix. I probably will alter the test to be less of a resource hog, or look at putting it into a different suite. It currently creates a table with 1000 columns and an index on each of those columns, and then runs compress table.

        Show
        Mike Matrigali added a comment - initial proof on concept patch, no extensive tests run yet. Passes store test for original 3993 error. The included test failed before the fix and passes after the fix. I probably will alter the test to be less of a resource hog, or look at putting it into a different suite. It currently creates a table with 1000 columns and an index on each of those columns, and then runs compress table.
        Hide
        Mike Matrigali added a comment -

        I have repro'd what I think is the problem using a very un-real world test case (table with 1000 columns and an index
        on every column). It would be nice to get a more real world test case that just uses a big table and whatever the ddl on that table that is causing the issue.

        The 2 users running into this seem to be both using relatively big tables, but I have not seen ddl yet. I am wondering if the problem they are seeing is related to either on-disk sort files or blobs.

        Show
        Mike Matrigali added a comment - I have repro'd what I think is the problem using a very un-real world test case (table with 1000 columns and an index on every column). It would be nice to get a more real world test case that just uses a big table and whatever the ddl on that table that is causing the issue. The 2 users running into this seem to be both using relatively big tables, but I have not seen ddl yet. I am wondering if the problem they are seeing is related to either on-disk sort files or blobs.
        Hide
        Matthew McCawley added a comment -

        This seems to have fixed the problem for me. It also decreases the runtime as it doesn't have to create such a large stack. However, I was running 10.8, and I built the patch on main (10.9 beta), so I need to rebuild on 10.8 to make sure that nothing else is different as well as test on a larger database.

        Show
        Matthew McCawley added a comment - This seems to have fixed the problem for me. It also decreases the runtime as it doesn't have to create such a large stack. However, I was running 10.8, and I built the patch on main (10.9 beta), so I need to rebuild on 10.8 to make sure that nothing else is different as well as test on a larger database.
        Hide
        Matthew McCawley added a comment -

        Built on 10.8 and run with a large table (over 45 million rows). I'll update if something changes, but it looks like this patch fixed the issue.

        Show
        Matthew McCawley added a comment - Built on 10.8 and run with a large table (over 45 million rows). I'll update if something changes, but it looks like this patch fixed the issue.
        Hide
        Mike Matrigali added a comment -

        checked in fix to trunk. Will let it sit there and make sure it passes tests across the nightly platforms and look at backporting it next week.

        DERBY-5624 System can run out of stack space while processing DropOnCommit reque
        sts.

        Taking care of cleanup after a commit is handled by notifying all "Observers"
        that an event has taken place that they might want to act on and cleanup. In
        the added test case this is triggered by off line commit which effectively
        drops and recreates the base table and all of its indexes after loading the
        data into them.

        Sometimes these Observers may execute work which adds to the Observer queue,
        and that queue can "miss" them in the first pass through.

        A previous fix for this problem added a recursive call to notifyObservers in
        the place that could cause this addition of observers. This recursive call
        was causing stack problems when the number of Observers became large. For
        the checked in test case this was 1000 indexes on 1000 columns of the table.
        For other users I believe the cause was a by product of sorts on large disk
        based sorts for multi-gigabyte tables and indexes. 2 users were reporting
        similar failed stacks for failing compresses of large tables, and one was
        able to take this fix to their environment and then successfully run the
        compress.

        The fix was to remove the recursion and instead loop at the outermost point
        until there were no Observers.

        Adding the test to the largedata suite as it takes over 10 minutes to run
        on my machine.

        Show
        Mike Matrigali added a comment - checked in fix to trunk. Will let it sit there and make sure it passes tests across the nightly platforms and look at backporting it next week. DERBY-5624 System can run out of stack space while processing DropOnCommit reque sts. Taking care of cleanup after a commit is handled by notifying all "Observers" that an event has taken place that they might want to act on and cleanup. In the added test case this is triggered by off line commit which effectively drops and recreates the base table and all of its indexes after loading the data into them. Sometimes these Observers may execute work which adds to the Observer queue, and that queue can "miss" them in the first pass through. A previous fix for this problem added a recursive call to notifyObservers in the place that could cause this addition of observers. This recursive call was causing stack problems when the number of Observers became large. For the checked in test case this was 1000 indexes on 1000 columns of the table. For other users I believe the cause was a by product of sorts on large disk based sorts for multi-gigabyte tables and indexes. 2 users were reporting similar failed stacks for failing compresses of large tables, and one was able to take this fix to their environment and then successfully run the compress. The fix was to remove the recursion and instead loop at the outermost point until there were no Observers. Adding the test to the largedata suite as it takes over 10 minutes to run on my machine.
        Hide
        Myrna van Lunteren added a comment -

        I'm reopening, because I saw a failure in my nightly test of largeDataTests suite, looks like this test is now pushing past the # of open files supported for my account:

        (emb)largedata.Derby5624Test.testDERBY_5624 java.io.FileNotFoundException: /local1/cloudtst/dev/src/NightlyBuildResults.2012-02-21/largeDataTests/fail/Embedded_40/Derby5624Test/testDERBY_5624/derby.log (Too many open files)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:190)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:142)
        at org.apache.derbyTesting.functionTests.util.PrivilegedFileOpsForTests.copySingleFile(PrivilegedFileOpsForTests.java:305)
        at org.apache.derbyTesting.functionTests.util.PrivilegedFileOpsForTests.recursiveCopy(PrivilegedFileOpsForTests.java:261)
        at org.apache.derbyTesting.functionTests.util.PrivilegedFileOpsForTests.access$000(PrivilegedFileOpsForTests.java:49)
        at org.apache.derbyTesting.functionTests.util.PrivilegedFileOpsForTests$8.run(PrivilegedFileOpsForTests.java:235)
        at java.security.AccessController.doPrivileged(AccessController.java:251)
        at org.apache.derbyTesting.functionTests.util.PrivilegedFileOpsForTests.copy(PrivilegedFileOpsForTests.java:233)
        at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:132)

        I'll attach the full error stack trace, and the console output of the suite run.

        Show
        Myrna van Lunteren added a comment - I'm reopening, because I saw a failure in my nightly test of largeDataTests suite, looks like this test is now pushing past the # of open files supported for my account: (emb)largedata.Derby5624Test.testDERBY_5624 java.io.FileNotFoundException: /local1/cloudtst/dev/src/NightlyBuildResults.2012-02-21/largeDataTests/fail/Embedded_40/Derby5624Test/testDERBY_5624/derby.log (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:190) at java.io.FileOutputStream.<init>(FileOutputStream.java:142) at org.apache.derbyTesting.functionTests.util.PrivilegedFileOpsForTests.copySingleFile(PrivilegedFileOpsForTests.java:305) at org.apache.derbyTesting.functionTests.util.PrivilegedFileOpsForTests.recursiveCopy(PrivilegedFileOpsForTests.java:261) at org.apache.derbyTesting.functionTests.util.PrivilegedFileOpsForTests.access$000(PrivilegedFileOpsForTests.java:49) at org.apache.derbyTesting.functionTests.util.PrivilegedFileOpsForTests$8.run(PrivilegedFileOpsForTests.java:235) at java.security.AccessController.doPrivileged(AccessController.java:251) at org.apache.derbyTesting.functionTests.util.PrivilegedFileOpsForTests.copy(PrivilegedFileOpsForTests.java:233) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:132) I'll attach the full error stack trace, and the console output of the suite run.
        Hide
        Myrna van Lunteren added a comment -

        Uploading the output of the largedata test run which failed, and the error-stacktrace from the fail directory.
        I've modified the location of the test dir a bit.

        Show
        Myrna van Lunteren added a comment - Uploading the output of the largedata test run which failed, and the error-stacktrace from the fail directory. I've modified the location of the test dir a bit.
        Hide
        Mike Matrigali added a comment -

        I will look at the linux error. I had run the test on windows and it was fine there. First thing I will look at is the compress call. It currently uses
        the default which builds all indexes in parallel, which may require all files to be open at the same time. Will see if sequential option still pops
        the bug before the fix. If I can't get a fix soon, I will disable test on non-windows.

        Show
        Mike Matrigali added a comment - I will look at the linux error. I had run the test on windows and it was fine there. First thing I will look at is the compress call. It currently uses the default which builds all indexes in parallel, which may require all files to be open at the same time. Will see if sequential option still pops the bug before the fix. If I can't get a fix soon, I will disable test on non-windows.
        Hide
        Myrna van Lunteren added a comment -

        Oddly enough, after the latest change, I saw another failure with the largeDataTest suite:

        1) LobLimitsTestjava.sql.SQLException: Table/View 'BLOBTBL' already exists in Schema 'APP'.
        at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(Unknown Source)
        at org.apache.derby.client.am.SqlException.getSQLException(Unknown Source)
        at org.apache.derby.client.am.Statement.execute(Unknown Source)
        at org.apache.derbyTesting.functionTests.tests.largedata.LobLimitsTest.setupTables(LobLimitsTest.java:107)
        at org.apache.derbyTesting.functionTests.tests.largedata.LobLimitsTest$1.decorateSQL(LobLimitsTest.java:141)
        at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.setUp(CleanDatabaseTestSetup.java:112)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:20)
        at junit.extensions.TestSetup.run(TestSetup.java:25)
        at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
        at junit.extensions.TestSetup.run(TestSetup.java:25)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
        at junit.extensions.TestSetup.run(TestSetup.java:25)
        at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
        at junit.extensions.TestSetup.run(TestSetup.java:25)
        Caused by: org.apache.derby.client.am.SqlException: Table/View 'BLOBTBL' already exists in Schema 'APP'.
        at org.apache.derby.client.am.Statement.completeSqlca(Unknown Source)
        at org.apache.derby.client.am.Statement.completeExecuteImmediate(Unknown Source)
        at org.apache.derby.client.net.NetStatementReply.parseEXCSQLIMMreply(Unknown Source)
        at org.apache.derby.client.net.NetStatementReply.readExecuteImmediate(Unknown Source)
        at org.apache.derby.client.net.StatementReply.readExecuteImmediate(Unknown Source)
        at org.apache.derby.client.net.NetStatement.readExecuteImmediate_(Unknown Source)
        at org.apache.derby.client.am.Statement.readExecuteImmediate(Unknown Source)
        at org.apache.derby.client.am.Statement.flowExecute(Unknown Source)
        at org.apache.derby.client.am.Statement.executeX(Unknown Source)
        ... 26 more

        Show
        Myrna van Lunteren added a comment - Oddly enough, after the latest change, I saw another failure with the largeDataTest suite: 1) LobLimitsTestjava.sql.SQLException: Table/View 'BLOBTBL' already exists in Schema 'APP'. at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.client.am.SqlException.getSQLException(Unknown Source) at org.apache.derby.client.am.Statement.execute(Unknown Source) at org.apache.derbyTesting.functionTests.tests.largedata.LobLimitsTest.setupTables(LobLimitsTest.java:107) at org.apache.derbyTesting.functionTests.tests.largedata.LobLimitsTest$1.decorateSQL(LobLimitsTest.java:141) at org.apache.derbyTesting.junit.CleanDatabaseTestSetup.setUp(CleanDatabaseTestSetup.java:112) at junit.extensions.TestSetup$1.protect(TestSetup.java:20) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) Caused by: org.apache.derby.client.am.SqlException: Table/View 'BLOBTBL' already exists in Schema 'APP'. at org.apache.derby.client.am.Statement.completeSqlca(Unknown Source) at org.apache.derby.client.am.Statement.completeExecuteImmediate(Unknown Source) at org.apache.derby.client.net.NetStatementReply.parseEXCSQLIMMreply(Unknown Source) at org.apache.derby.client.net.NetStatementReply.readExecuteImmediate(Unknown Source) at org.apache.derby.client.net.StatementReply.readExecuteImmediate(Unknown Source) at org.apache.derby.client.net.NetStatement.readExecuteImmediate_(Unknown Source) at org.apache.derby.client.am.Statement.readExecuteImmediate(Unknown Source) at org.apache.derby.client.am.Statement.flowExecute(Unknown Source) at org.apache.derby.client.am.Statement.executeX(Unknown Source) ... 26 more
        Hide
        Matthew McCawley added a comment -

        I ran the patch on my databases in Ubuntu and did not have the same problem.

        Show
        Matthew McCawley added a comment - I ran the patch on my databases in Ubuntu and did not have the same problem.
        Hide
        Myrna van Lunteren added a comment -

        I've seen this failure again, this time on windows.
        It happened in the very last test:
        (emb)largedata.LobLimitsTest.test_05_ClobNegative used 707031 ms E
        The Derby5624Test passed on this windows run.

        It appears to me that even without the new test the largeDataTest suite runs slower than before on Linux.
        Perhaps this is related to changes made to the test harness recently?
        Some changes went in just today that perhaps improve this - I'll rerun and report back.

        Unfortunately, when the suite passes on my Linux machines, it takes about 11 hours or longer, and fills up the disk to up to 99% at times.
        I use a new test directory each night. So if there's a failure, things get progressively worse in subsequent tests because of left-over files filling up the disk. It's a balancing act between getting time to look at a failure and filling up the disk.

        Show
        Myrna van Lunteren added a comment - I've seen this failure again, this time on windows. It happened in the very last test: (emb)largedata.LobLimitsTest.test_05_ClobNegative used 707031 ms E The Derby5624Test passed on this windows run. It appears to me that even without the new test the largeDataTest suite runs slower than before on Linux. Perhaps this is related to changes made to the test harness recently? Some changes went in just today that perhaps improve this - I'll rerun and report back. Unfortunately, when the suite passes on my Linux machines, it takes about 11 hours or longer, and fills up the disk to up to 99% at times. I use a new test directory each night. So if there's a failure, things get progressively worse in subsequent tests because of left-over files filling up the disk. It's a balancing act between getting time to look at a failure and filling up the disk.
        Hide
        Kristian Waagan added a comment -

        A change introduced by DERBY-5614 may have cause the slowdown, but I don't see how it's related to the extra disk usage. The bug in SpawnedProcess would cause the VM to stay alive idling until the TimerTask got purged (even when cancelled), but the order of the tests matters. So it would only be relevant if a test using SpawnedProcess ran less than 45 minutes before the last test in the run finished.

        When I ran suites.All in parallell just now (4 runners), it took 1 hour and 6 minutes (including org.apache.derby.PackagePrivateTestSuite and org.apache.derbyTesting.functionTests.tests.memory._Suite). This was on Solaris 11 with JDK 7. Assuming all 4 runners are busy at all times that's a worst case time of around 4.5 hours - 11 hours definitely sounds like way too much (unless that machine is running a lot of other stuff at the same time).

        Show
        Kristian Waagan added a comment - A change introduced by DERBY-5614 may have cause the slowdown, but I don't see how it's related to the extra disk usage. The bug in SpawnedProcess would cause the VM to stay alive idling until the TimerTask got purged (even when cancelled), but the order of the tests matters. So it would only be relevant if a test using SpawnedProcess ran less than 45 minutes before the last test in the run finished. When I ran suites.All in parallell just now (4 runners), it took 1 hour and 6 minutes (including org.apache.derby.PackagePrivateTestSuite and org.apache.derbyTesting.functionTests.tests.memory._Suite). This was on Solaris 11 with JDK 7. Assuming all 4 runners are busy at all times that's a worst case time of around 4.5 hours - 11 hours definitely sounds like way too much (unless that machine is running a lot of other stuff at the same time).
        Hide
        Myrna van Lunteren added a comment -

        Hi Kristian, thanks for your response...
        I didn't mean to imply that the disk usage is related the changes for DERBY-5614 or any other recent change, apologies for the confusion.

        In comments on this bug I was referring to the full largedata._Suite (suites.All only runs largedata.LobLimitsLiteTest).

        Show
        Myrna van Lunteren added a comment - Hi Kristian, thanks for your response... I didn't mean to imply that the disk usage is related the changes for DERBY-5614 or any other recent change, apologies for the confusion. In comments on this bug I was referring to the full largedata._Suite (suites.All only runs largedata.LobLimitsLiteTest).
        Hide
        Mike Matrigali added a comment -

        i'd appreciate it if someone good with junit could review my new test. Most critical would be if it has the appropriate decorators. I could move the new test to the end if we think it is affecting the other tests, or alter the test to drop the table sooner (I was assuming the table would get
        dropped automatically). The new test really should not be that much of a drain on disk space, compared to the rest of the tests as it mostly
        adds 1000 small files.

        I can also back out test on windows also if we think it is causing problems there.

        Show
        Mike Matrigali added a comment - i'd appreciate it if someone good with junit could review my new test. Most critical would be if it has the appropriate decorators. I could move the new test to the end if we think it is affecting the other tests, or alter the test to drop the table sooner (I was assuming the table would get dropped automatically). The new test really should not be that much of a drain on disk space, compared to the rest of the tests as it mostly adds 1000 small files. I can also back out test on windows also if we think it is causing problems there.
        Hide
        Myrna van Lunteren added a comment -

        I think the fact that I saw the File already exists failure with both a run on windows and one on linux, while we have the test skipped on linux, proves that your test does not cause any harm. Apologies for not making this clear.

        Show
        Myrna van Lunteren added a comment - I think the fact that I saw the File already exists failure with both a run on windows and one on linux, while we have the test skipped on linux, proves that your test does not cause any harm. Apologies for not making this clear.
        Hide
        Myrna van Lunteren added a comment -

        I changed the maximum open file setting for one linux machine from 1024 to 2048, and then the Derby5624Test passed.

        Show
        Myrna van Lunteren added a comment - I changed the maximum open file setting for one linux machine from 1024 to 2048, and then the Derby5624Test passed.
        Hide
        Myrna van Lunteren added a comment -

        I reran the largedata._Suite in windows with ibm 1.6 (sr9 fp1), build sync-ed to revision 1294957, and this time, I did not see the 'table/view already exist' failure - all pass. So, that really was unrelated. Also, my run on linux with ibm 1.6 sr9fp1(skipping Derby5624Test) passed cleanly again. So the only remaining concern is the running out of file descriptors on a default configured linux system (with max no. of open files set to 1024).

        Show
        Myrna van Lunteren added a comment - I reran the largedata._Suite in windows with ibm 1.6 (sr9 fp1), build sync-ed to revision 1294957, and this time, I did not see the 'table/view already exist' failure - all pass. So, that really was unrelated. Also, my run on linux with ibm 1.6 sr9fp1(skipping Derby5624Test) passed cleanly again. So the only remaining concern is the running out of file descriptors on a default configured linux system (with max no. of open files set to 1024).
        Hide
        Dag H. Wanvik added a comment -

        I tested Derby5624Test on Solaris 11 (standalone) and it ran OK there, so it could be included in the suite for that platform, I think. It took 317s on my box.

        Show
        Dag H. Wanvik added a comment - I tested Derby5624Test on Solaris 11 (standalone) and it ran OK there, so it could be included in the suite for that platform, I think. It took 317s on my box.
        Hide
        Mike Matrigali added a comment -

        fixed in trunk and backported to 10.8, 10.7, 10.6, and 10.5.

        DERBY-5624 System can run out of stack space while processing DropOnCommit requests.

        Taking care of cleanup after a commit is handled by notifying all "Observers"
        that an event has taken place that they might want to act on and cleanup. In
        the added test case this is triggered by off line commit which effectively
        drops and recreates the base table and all of its indexes after loading the
        data into them.

        Sometimes these Observers may execute work which adds to the Observer queue,
        and that queue can "miss" them in the first pass through.

        A previous fix for this problem added a recursive call to notifyObservers in
        the place that could cause this addition of observers. This recursive call
        was causing stack problems when the number of Observers became large. For
        the checked in test case this was 1000 indexes on 1000 columns of the table.
        For other users I believe the cause was a by product of sorts on large disk
        based sorts for multi-gigabyte tables and indexes. 2 users were reporting
        similar failed stacks for failing compresses of large tables, and one was
        able to take this fix to their environment and then successfully run the
        compress.

        The fix was to remove the recursion and instead loop at the outermost point
        until there were no Observers.

        Adding the test to the largedata suite as it takes over 10 minutes to run
        on my machine.

        Show
        Mike Matrigali added a comment - fixed in trunk and backported to 10.8, 10.7, 10.6, and 10.5. DERBY-5624 System can run out of stack space while processing DropOnCommit requests. Taking care of cleanup after a commit is handled by notifying all "Observers" that an event has taken place that they might want to act on and cleanup. In the added test case this is triggered by off line commit which effectively drops and recreates the base table and all of its indexes after loading the data into them. Sometimes these Observers may execute work which adds to the Observer queue, and that queue can "miss" them in the first pass through. A previous fix for this problem added a recursive call to notifyObservers in the place that could cause this addition of observers. This recursive call was causing stack problems when the number of Observers became large. For the checked in test case this was 1000 indexes on 1000 columns of the table. For other users I believe the cause was a by product of sorts on large disk based sorts for multi-gigabyte tables and indexes. 2 users were reporting similar failed stacks for failing compresses of large tables, and one was able to take this fix to their environment and then successfully run the compress. The fix was to remove the recursion and instead loop at the outermost point until there were no Observers. Adding the test to the largedata suite as it takes over 10 minutes to run on my machine.

          People

          • Assignee:
            Mike Matrigali
            Reporter:
            Mike Matrigali
          • Votes:
            3 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development