Derby
  1. Derby
  2. DERBY-5447

Deadlock in AutomaticIndexStatisticsTest.testShutdownWhileScanningThenDelete (BasePage.releaseExclusive and Observable.deleteObserver (BaseContainerHandle))

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 10.9.1.0
    • Fix Version/s: 10.8.2.2, 10.9.1.0
    • Component/s: Services
    • Labels:
      None

      Description

      Java deadlock involving BasePage.releaseExclusive and Observable.deleteObserver, the observable being a BaseContainerHandle instance, seen when running AutomaticIndexStatisticsTest.testShutdownWhileScanningThenDelete.
      The activities involved are a scan of a conglomerate and the index statistics daemon being stopped as part of the database shutdown.

      Here are the relevant parts of the stack trace:
      "index-stat-thread" daemon prio=10 tid=0x00007f4e34244000 nid=0x380b waiting for monitor entry [0x00007f4e30aef000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at java.util.Observable.deleteObserver(Observable.java:95)

      • waiting to lock <0x00000000f51132d0> (a org.apache.derby.impl.store.raw.data.BaseContainerHandle)
        at org.apache.derby.impl.store.raw.data.BasePage.releaseExclusive(BasePage.java:1819)
      • locked <0x00000000f6d5a280> (a org.apache.derby.impl.store.raw.data.StoredPage)
        at org.apache.derby.impl.store.raw.data.CachedPage.releaseExclusive(CachedPage.java:531)
        at org.apache.derby.impl.store.raw.data.StoredPage.releaseExclusive(StoredPage.java:1066)
        at org.apache.derby.impl.store.raw.data.BasePage.unlatch(BasePage.java:1371)
        at org.apache.derby.impl.store.access.btree.ControlRow.release(ControlRow.java:926)
        at org.apache.derby.impl.store.access.btree.BTreeScan.savePositionAndReleasePage(BTreeScan.java:2146)
        at org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTreeForwardScan.java:442)
        at org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeScan.java:1681)
        at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl$KeyComparator.fetchRows(IndexStatisticsDaemonImpl.java:1221)
        at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateIndexStatsMinion(IndexStatisticsDaemonImpl.java:463)
        at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.generateStatistics(IndexStatisticsDaemonImpl.java:323)
        at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.processingLoop(IndexStatisticsDaemonImpl.java:794)
        at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.run(IndexStatisticsDaemonImpl.java:710)
        at java.lang.Thread.run(Thread.java:679)

      "main":
      at org.apache.derby.impl.store.raw.data.BasePage.isLatched(BasePage.java:1383)

      • waiting to lock <0x00000000f6d5a280> (a org.apache.derby.impl.store.raw.data.StoredPage)
        at org.apache.derby.impl.store.raw.data.BasePage.update(BasePage.java:1621)
        at java.util.Observable.notifyObservers(Observable.java:159)
        at java.util.Observable.notifyObservers(Observable.java:115)
        at org.apache.derby.impl.store.raw.data.BaseContainerHandle.informObservers(BaseContainerHandle.java:1008)
        at org.apache.derby.impl.store.raw.data.BaseContainerHandle.close(BaseContainerHandle.java:414)
      • locked <0x00000000f51132d0> (a org.apache.derby.impl.store.raw.data.BaseContainerHandle)
        at org.apache.derby.impl.store.access.btree.OpenBTree.close(OpenBTree.java:490)
        at org.apache.derby.impl.store.access.btree.BTreeScan.closeForEndTransaction(BTreeScan.java:2021)
        at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.closeForEndTransaction(B2IForwardScan.java:107)
        at org.apache.derby.impl.store.access.RAMTransaction.closeControllers(RAMTransaction.java:245)
        at org.apache.derby.impl.store.access.RAMTransactionContext.cleanupOnError(RAMTransactionContext.java:97)
        at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(ContextManager.java:343)
        at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.stop(IndexStatisticsDaemonImpl.java:919)
      • locked <0x00000000f4a5a070> (a java.util.ArrayList)
        at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.disableIndexStatsRefresher(DataDictionaryImpl.java:13891)
        at org.apache.derby.impl.db.DatabaseContextImpl.cleanupOnError(DatabaseContextImpl.java:69)
        at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(ContextManager.java:343)
        at org.apache.derby.impl.jdbc.TransactionResourceImpl.cleanupOnError(TransactionResourceImpl.java:437)
        at org.apache.derby.impl.jdbc.EmbedConnection.<init>(EmbedConnection.java:633)
        at org.apache.derby.impl.jdbc.EmbedConnection30.<init>(EmbedConnection30.java:73)
        at org.apache.derby.impl.jdbc.EmbedConnection40.<init>(EmbedConnection40.java:51)
        at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Driver40.java:70)
        at org.apache.derby.jdbc.InternalDriver.connect(InternalDriver.java:248)
        at org.apache.derby.jdbc.EmbeddedDataSource.getConnection(EmbeddedDataSource.java:480)
        at org.apache.derby.jdbc.EmbeddedDataSource.getConnection(EmbeddedDataSource.java:424)
        at org.apache.derbyTesting.junit.JDBCDataSource.shutdownDatabase(JDBCDataSource.java:266)
        at org.apache.derbyTesting.functionTests.tests.store.AutomaticIndexStatisticsTest.testShutdownWhileScanningThenDelete(AutomaticIndexStatisticsTest.java:187)

      I have access to a machine on which this can be reproduced pretty simple. It's an Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz running Fedora 15.
      Java is:
      OpenJDK Runtime Environment (IcedTea6 1.10.2) (fedora-58.1.10.2.fc15-x86_64)
      OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)

      In a way this bug is critical, since it causes two threads to hang forever. However, for it to happen the database must be shut down while the index statistics daemon is doing work and the timing must be right. There may be other ways to trigger the bug where Observable.update and BasePage.releaseExclusive are involved.

        Issue Links

          Activity

          Hide
          Kristian Waagan added a comment -

          The stack for the problem with an insane build:
          "index-stat-thread":
          at java.util.Observable.deleteObserver(Observable.java:95)

          • waiting to lock <0x00000000e83304a8> (a org.apache.derby.impl.store.raw.data.BaseContainerHandle)
            at org.apache.derby.impl.store.raw.data.BasePage.releaseExclusive(BasePage.java:1819)
          • locked <0x00000000e8c9f980> (a org.apache.derby.impl.store.raw.data.StoredPage)
            at org.apache.derby.impl.store.raw.data.CachedPage.releaseExclusive(CachedPage.java:531)
            at org.apache.derby.impl.store.raw.data.StoredPage.releaseExclusive(StoredPage.java:1066)
            at org.apache.derby.impl.store.raw.data.BasePage.unlatch(BasePage.java:1371)
            at org.apache.derby.impl.store.access.btree.ControlRow.release(ControlRow.java:926)
            at org.apache.derby.impl.store.access.btree.BTreeScan.positionAtNextPage(BTreeScan.java:505)
            at org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTreeForwardScan.java:464)
            at org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeScan.java:1681)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl$KeyComparator.fetchRows(IndexStatisticsDaemonImpl.java:1221)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateIndexStatsMinion(IndexStatisticsDaemonImpl.java:463)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.generateStatistics(IndexStatisticsDaemonImpl.java:323)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.processingLoop(IndexStatisticsDaemonImpl.java:794)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.run(IndexStatisticsDaemonImpl.java:710)
            at java.lang.Thread.run(Thread.java:679)
            "main":
            at org.apache.derby.impl.store.raw.data.BasePage.releaseExclusive(BasePage.java:1814)
          • waiting to lock <0x00000000e8c9f980> (a org.apache.derby.impl.store.raw.data.StoredPage)
            at org.apache.derby.impl.store.raw.data.CachedPage.releaseExclusive(CachedPage.java:531)
            at org.apache.derby.impl.store.raw.data.StoredPage.releaseExclusive(StoredPage.java:1066)
            at org.apache.derby.impl.store.raw.data.BasePage.update(BasePage.java:1625)
            at java.util.Observable.notifyObservers(Observable.java:159)
            at java.util.Observable.notifyObservers(Observable.java:115)
            at org.apache.derby.impl.store.raw.data.BaseContainerHandle.informObservers(BaseContainerHandle.java:1008)
            at org.apache.derby.impl.store.raw.data.BaseContainerHandle.close(BaseContainerHandle.java:414)
          • locked <0x00000000e83304a8> (a org.apache.derby.impl.store.raw.data.BaseContainerHandle)
            at org.apache.derby.impl.store.access.btree.OpenBTree.close(OpenBTree.java:490)
            at org.apache.derby.impl.store.access.btree.BTreeScan.closeForEndTransaction(BTreeScan.java:2021)
            at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.closeForEndTransaction(B2IForwardScan.java:107)
            at org.apache.derby.impl.store.access.RAMTransaction.closeControllers(RAMTransaction.java:245)
            at org.apache.derby.impl.store.access.RAMTransactionContext.cleanupOnError(RAMTransactionContext.java:97)
            at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(ContextManager.java:343)
            at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.stop(IndexStatisticsDaemonImpl.java:919)
          • locked <0x00000000e78b35b0> (a java.util.ArrayList)
            at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.disableIndexStatsRefresher(DataDictionaryImpl.java:13891)
            at org.apache.derby.impl.db.DatabaseContextImpl.cleanupOnError(DatabaseContextImpl.java:69)
            at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(ContextManager.java:343)
            at org.apache.derby.impl.jdbc.TransactionResourceImpl.cleanupOnError(TransactionResourceImpl.java:437)
            at org.apache.derby.impl.jdbc.EmbedConnection.<init>(EmbedConnection.java:633)
            at org.apache.derby.impl.jdbc.EmbedConnection30.<init>(EmbedConnection30.java:73)
            at org.apache.derby.impl.jdbc.EmbedConnection40.<init>(EmbedConnection40.java:51)
            at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Driver40.java:70)
            at org.apache.derby.jdbc.InternalDriver.connect(InternalDriver.java:248)
            at org.apache.derby.jdbc.EmbeddedDataSource.getConnection(EmbeddedDataSource.java:480)
            at org.apache.derby.jdbc.EmbeddedDataSource.getConnection(EmbeddedDataSource.java:424)
            at org.apache.derbyTesting.junit.JDBCDataSource.shutdownDatabase(JDBCDataSource.java:266)
            at org.apache.derbyTesting.functionTests.tests.store.AutomaticIndexStatisticsTest.testShutdownWhileScanningThenDelete(AutomaticIndexStatisticsTest.java:187)
          Show
          Kristian Waagan added a comment - The stack for the problem with an insane build: "index-stat-thread": at java.util.Observable.deleteObserver(Observable.java:95) waiting to lock <0x00000000e83304a8> (a org.apache.derby.impl.store.raw.data.BaseContainerHandle) at org.apache.derby.impl.store.raw.data.BasePage.releaseExclusive(BasePage.java:1819) locked <0x00000000e8c9f980> (a org.apache.derby.impl.store.raw.data.StoredPage) at org.apache.derby.impl.store.raw.data.CachedPage.releaseExclusive(CachedPage.java:531) at org.apache.derby.impl.store.raw.data.StoredPage.releaseExclusive(StoredPage.java:1066) at org.apache.derby.impl.store.raw.data.BasePage.unlatch(BasePage.java:1371) at org.apache.derby.impl.store.access.btree.ControlRow.release(ControlRow.java:926) at org.apache.derby.impl.store.access.btree.BTreeScan.positionAtNextPage(BTreeScan.java:505) at org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows(BTreeForwardScan.java:464) at org.apache.derby.impl.store.access.btree.BTreeScan.fetchNextGroup(BTreeScan.java:1681) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl$KeyComparator.fetchRows(IndexStatisticsDaemonImpl.java:1221) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateIndexStatsMinion(IndexStatisticsDaemonImpl.java:463) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.generateStatistics(IndexStatisticsDaemonImpl.java:323) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.processingLoop(IndexStatisticsDaemonImpl.java:794) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.run(IndexStatisticsDaemonImpl.java:710) at java.lang.Thread.run(Thread.java:679) "main": at org.apache.derby.impl.store.raw.data.BasePage.releaseExclusive(BasePage.java:1814) waiting to lock <0x00000000e8c9f980> (a org.apache.derby.impl.store.raw.data.StoredPage) at org.apache.derby.impl.store.raw.data.CachedPage.releaseExclusive(CachedPage.java:531) at org.apache.derby.impl.store.raw.data.StoredPage.releaseExclusive(StoredPage.java:1066) at org.apache.derby.impl.store.raw.data.BasePage.update(BasePage.java:1625) at java.util.Observable.notifyObservers(Observable.java:159) at java.util.Observable.notifyObservers(Observable.java:115) at org.apache.derby.impl.store.raw.data.BaseContainerHandle.informObservers(BaseContainerHandle.java:1008) at org.apache.derby.impl.store.raw.data.BaseContainerHandle.close(BaseContainerHandle.java:414) locked <0x00000000e83304a8> (a org.apache.derby.impl.store.raw.data.BaseContainerHandle) at org.apache.derby.impl.store.access.btree.OpenBTree.close(OpenBTree.java:490) at org.apache.derby.impl.store.access.btree.BTreeScan.closeForEndTransaction(BTreeScan.java:2021) at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.closeForEndTransaction(B2IForwardScan.java:107) at org.apache.derby.impl.store.access.RAMTransaction.closeControllers(RAMTransaction.java:245) at org.apache.derby.impl.store.access.RAMTransactionContext.cleanupOnError(RAMTransactionContext.java:97) at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(ContextManager.java:343) at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.stop(IndexStatisticsDaemonImpl.java:919) locked <0x00000000e78b35b0> (a java.util.ArrayList) at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.disableIndexStatsRefresher(DataDictionaryImpl.java:13891) at org.apache.derby.impl.db.DatabaseContextImpl.cleanupOnError(DatabaseContextImpl.java:69) at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(ContextManager.java:343) at org.apache.derby.impl.jdbc.TransactionResourceImpl.cleanupOnError(TransactionResourceImpl.java:437) at org.apache.derby.impl.jdbc.EmbedConnection.<init>(EmbedConnection.java:633) at org.apache.derby.impl.jdbc.EmbedConnection30.<init>(EmbedConnection30.java:73) at org.apache.derby.impl.jdbc.EmbedConnection40.<init>(EmbedConnection40.java:51) at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Driver40.java:70) at org.apache.derby.jdbc.InternalDriver.connect(InternalDriver.java:248) at org.apache.derby.jdbc.EmbeddedDataSource.getConnection(EmbeddedDataSource.java:480) at org.apache.derby.jdbc.EmbeddedDataSource.getConnection(EmbeddedDataSource.java:424) at org.apache.derbyTesting.junit.JDBCDataSource.shutdownDatabase(JDBCDataSource.java:266) at org.apache.derbyTesting.functionTests.tests.store.AutomaticIndexStatisticsTest.testShutdownWhileScanningThenDelete(AutomaticIndexStatisticsTest.java:187)
          Hide
          Kristian Waagan added a comment -

          Attaching patches 1a and 2a, both independently fixing the bug.
          Patch 1a makes changes in a core method in BasePage, it introduces a "custom" lock pattern, and it cannot guarantee that the monitor obtained last isn't already obtained elsewhere outside of this method (which would nullify the fix).

          Patch 2a makes changes to IndexStatisticsDaemonImpl.stop. It addresses this particular instance of the bug, but not the root cause, by making sure the worker thread has been shut down before starting the context cleanup operation.

          Since no other instances of the bug has been reported so far, I plan to commit patch 2a. It is simpler, and has far less risk associated with it. I further hope to get it into the upcoming 10.8.2. release.

          I will commit 2a as soon as the last regression run has completed. On Monday I hope to backport the fix to the 10.8 branch.

          I'm also rerunning the AutomaticIndexStatisticsTest due to a change in the patch. The previous patch ran the test more than 2200 times without hitting the deadlock. Without the fix, the deadlock is usually occurring in less than 50 iterations of the test (on a specific machine).

          Patch ready for review.

          Show
          Kristian Waagan added a comment - Attaching patches 1a and 2a, both independently fixing the bug. Patch 1a makes changes in a core method in BasePage, it introduces a "custom" lock pattern, and it cannot guarantee that the monitor obtained last isn't already obtained elsewhere outside of this method (which would nullify the fix). Patch 2a makes changes to IndexStatisticsDaemonImpl.stop. It addresses this particular instance of the bug, but not the root cause, by making sure the worker thread has been shut down before starting the context cleanup operation. Since no other instances of the bug has been reported so far, I plan to commit patch 2a. It is simpler, and has far less risk associated with it. I further hope to get it into the upcoming 10.8.2. release. I will commit 2a as soon as the last regression run has completed. On Monday I hope to backport the fix to the 10.8 branch. I'm also rerunning the AutomaticIndexStatisticsTest due to a change in the patch. The previous patch ran the test more than 2200 times without hitting the deadlock. Without the fix, the deadlock is usually occurring in less than 50 iterations of the test (on a specific machine). Patch ready for review.
          Hide
          Dag H. Wanvik added a comment -

          Waiting with the context manager cleanup till the daemon thread is done (if any), seems a safe approach to me. +1

          I am not quite sure I understand you concern about patch 1a: if 2a doesn't fix the root cause, does the root cause deserve a new JIRA (as I understand it you are not sure if we have/do not have similar locking patterns elsewhere..)

          Show
          Dag H. Wanvik added a comment - Waiting with the context manager cleanup till the daemon thread is done (if any), seems a safe approach to me. +1 I am not quite sure I understand you concern about patch 1a: if 2a doesn't fix the root cause, does the root cause deserve a new JIRA (as I understand it you are not sure if we have/do not have similar locking patterns elsewhere..)
          Hide
          Kristian Waagan added a comment -

          Committed patch 2a to trunk with revision 1180790. Doing a final run on 10.8 before backporting.
          Patch 2a ran AutomaticIndexStatisticsTest nearly 1700 times without errors.

          Dag, thanks for looking at the patch.
          My concern with patch 1a is that it places a burden on the caller of releaseExclusive: the caller must ensure that the monitor on 'this' isn't already acquired. This can easily be broken, for instance by making BasePage.unlatch synchronized, or adding a new synchronized method calling releaseExclusive. This may have already been broken, as it's hard to verify. Since BasePage is a very mature piece of code, this would probably not be a problem in practice. On the other hand, I'm sure there are better ways to address the root cause.

          Now that this issue is fixed, I don't know how to provoke the problem. I think it must involve two threads using the Observable API, and an Observer.update-method that acquires the monitor of both the object being observed and the observer and calls back into the object being observed (i.e. typically removeObserver). If my IDE isn't fooling me, there are five other update-methods. A quick check didn't reveal the synchronization issue in those five implementations.

          Show
          Kristian Waagan added a comment - Committed patch 2a to trunk with revision 1180790. Doing a final run on 10.8 before backporting. Patch 2a ran AutomaticIndexStatisticsTest nearly 1700 times without errors. Dag, thanks for looking at the patch. My concern with patch 1a is that it places a burden on the caller of releaseExclusive: the caller must ensure that the monitor on 'this' isn't already acquired. This can easily be broken, for instance by making BasePage.unlatch synchronized, or adding a new synchronized method calling releaseExclusive. This may have already been broken, as it's hard to verify. Since BasePage is a very mature piece of code, this would probably not be a problem in practice. On the other hand, I'm sure there are better ways to address the root cause. Now that this issue is fixed, I don't know how to provoke the problem. I think it must involve two threads using the Observable API, and an Observer.update-method that acquires the monitor of both the object being observed and the observer and calls back into the object being observed (i.e. typically removeObserver). If my IDE isn't fooling me, there are five other update-methods. A quick check didn't reveal the synchronization issue in those five implementations.
          Hide
          Kristian Waagan added a comment -

          Backported to 10.8 with revision 1180811.

          Show
          Kristian Waagan added a comment - Backported to 10.8 with revision 1180811.
          Hide
          Kristian Waagan added a comment -

          Closing issue.

          Show
          Kristian Waagan added a comment - Closing issue.

            People

            • Assignee:
              Kristian Waagan
              Reporter:
              Kristian Waagan
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development