Derby
  1. Derby
  2. DERBY-5153

Intermittent ASSERT FAILED Internal Error-- statistics not found in selectivityForConglomerate when running InterruptResilienceTest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.8.1.2
    • Fix Version/s: 10.8.1.2
    • Component/s: SQL
    • Labels:
      None

      Description

      Cf the enclosed derby.log:

      While executing this statement: "select * from mtTab where i=?", we see this stacktrace:

      org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED Internal Error-- statistics not found in selectivityForConglomerate.
      cd = ConglomerateDescriptor: conglomerateNumber = 1249 name = SQL110325154339720 uuid = f04340b7-012e-ed78-50c3-00005e21fe7a indexable = true
      numKeys = 1
      at org.apache.derby.shared.common.sanity.SanityManager.THROWASSERT(SanityManager.java:162)
      at org.apache.derby.shared.common.sanity.SanityManager.THROWASSERT(SanityManager.java:147)
      at org.apache.derby.iapi.sql.dictionary.TableDescriptor.selectivityForConglomerate(TableDescriptor.java:1443)
      at org.apache.derby.impl.sql.compile.PredicateList.selectivity(PredicateList.java:3903)
      at org.apache.derby.impl.sql.compile.FromBaseTable.estimateCost(FromBaseTable.java:1295)
      at org.apache.derby.impl.sql.compile.OptimizerImpl.estimateTotalCost(OptimizerImpl.java:2626)
      at org.apache.derby.impl.sql.compile.OptimizerImpl.costBasedCostOptimizable(OptimizerImpl.java:2172)
      at org.apache.derby.impl.sql.compile.OptimizerImpl.costOptimizable(OptimizerImpl.java:1985)
      at org.apache.derby.impl.sql.compile.FromBaseTable.optimizeIt(FromBaseTable.java:526)
      at org.apache.derby.impl.sql.compile.ProjectRestrictNode.optimizeIt(ProjectRestrictNode.java:316)
      at org.apache.derby.impl.sql.compile.OptimizerImpl.costPermutation(OptimizerImpl.java:1939)
      at org.apache.derby.impl.sql.compile.SelectNode.optimize(SelectNode.java:1916)
      at org.apache.derby.impl.sql.compile.DMLStatementNode.optimizeStatement(DMLStatementNode.java:315)
      at org.apache.derby.impl.sql.compile.CursorNode.optimizeStatement(CursorNode.java:587)
      at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:384)
      at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:85)
      at org.apache.derby.impl.sql.GenericPreparedStatement.rePrepare(GenericPreparedStatement.java:229)
      at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(GenericPreparedStatement.java:409)
      at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:317)
      at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1242)
      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1686)
      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeQuery(EmbedPreparedStatement.java:284)
      at org.apache.derbyTesting.functionTests.tests.store.InterruptResilienceTest$WorkerThread.run(InterruptResilienceTest.java:414)

      I saw this twice with the enclosed patch to InterruptResilienceTest (adds a test case in preparation for DERBY-5152) - but the error occurs before we execute that fixture, so I think the patch is irrelevant (a third and fourth execution failed to show the issue). I am posting it here in case somebody can guess what could be wrong, I'll run more experiments to see if I can reproduce it. Could it be related to our new index statistics daemon?

      1. derby.log
        74 kB
        Dag H. Wanvik
      2. D5153.java
        2 kB
        Knut Anders Hatlen
      3. test.diff
        4 kB
        Knut Anders Hatlen
      4. remove-asserts.diff
        3 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Dag H. Wanvik added a comment -

          The explanation of the bug and the heuristics sounds convincing to me. +1

          Show
          Dag H. Wanvik added a comment - The explanation of the bug and the heuristics sounds convincing to me. +1
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 1088495.

          Show
          Knut Anders Hatlen added a comment - Committed revision 1088495.
          Hide
          Knut Anders Hatlen added a comment -

          I'm attaching a patch that removes the asserts and enables the test case.

          Before the patch, if the statistics couldn't be found by non-debug builds, different heuristics for the selectivity would be used depending on where it detected that statistics were missing.

          If it was detected near the start of selectivityForConglomerate(), it would estimate the selectivity to 0.1^(numKeys+1). But if it was detected later in the method, 0.1 would be returned.

          I think the first estimate is the best one, as it sounds reasonable that the selectivity is reduced (fewer rows match) the more columns we specify in the predicates. However, I believe it was a typo in the original code that made it estimate the selectivity as 0.1^(numKeys+1) and not as 0.1^numKeys. I've therefore removed both of the original heuristics and replaced them with one single heuristic which is 0.1^numKeys.

          The reason why I think the original calculation has a typo, is that other places in the code we usually calculate selectivity in a way similar to this:

          double selectivity = 1.0;
          for (int i = 0; i < predicates; i++)

          { selectivity *= <selectivity for predicate i>; }

          In selectivityForConglomerate, the selectivity is initialized to 0.1 instead of 1.0, apparently for no good reason. Apart from not being consistent with other calculations, it has some strange effects on the boundaries, like the selectivity for numKeys=0, that is the selectivity for TRUE, becomes 0.1, whereas it clearly should be 1.0.

          In any case, the accuracy of the heuristic is probably not very important in this case. This will only happen if the compilation of a query happens to come to this code in the short window between dropping of the old statistics and inserting the new statistics. And recreation of statistics invalidates the query plan, so the query is likely to be recompiled with fresh statistics very soon afterwards.

          Show
          Knut Anders Hatlen added a comment - I'm attaching a patch that removes the asserts and enables the test case. Before the patch, if the statistics couldn't be found by non-debug builds, different heuristics for the selectivity would be used depending on where it detected that statistics were missing. If it was detected near the start of selectivityForConglomerate(), it would estimate the selectivity to 0.1^(numKeys+1). But if it was detected later in the method, 0.1 would be returned. I think the first estimate is the best one, as it sounds reasonable that the selectivity is reduced (fewer rows match) the more columns we specify in the predicates. However, I believe it was a typo in the original code that made it estimate the selectivity as 0.1^(numKeys+1) and not as 0.1^numKeys. I've therefore removed both of the original heuristics and replaced them with one single heuristic which is 0.1^numKeys. The reason why I think the original calculation has a typo, is that other places in the code we usually calculate selectivity in a way similar to this: double selectivity = 1.0; for (int i = 0; i < predicates; i++) { selectivity *= <selectivity for predicate i>; } In selectivityForConglomerate, the selectivity is initialized to 0.1 instead of 1.0, apparently for no good reason. Apart from not being consistent with other calculations, it has some strange effects on the boundaries, like the selectivity for numKeys=0, that is the selectivity for TRUE, becomes 0.1, whereas it clearly should be 1.0. In any case, the accuracy of the heuristic is probably not very important in this case. This will only happen if the compilation of a query happens to come to this code in the short window between dropping of the old statistics and inserting the new statistics. And recreation of statistics invalidates the query plan, so the query is likely to be recompiled with fresh statistics very soon afterwards.
          Hide
          Knut Anders Hatlen added a comment -

          I think the problem here is that code on a higher level check that the statistics exist before calling TableDescriptor.selectivityForConglomerate() and assumes that the statistics exist in that method. However, there are no locks or other mechanisms that prevent the statistics from being dropped between the check on the higher level and the call to selectivityForConglomerate(). Since selectivityForConglomerate() knows how to handle missing statistics in a reasonable way (which it does in production builds), I think we should just remove the asserts here.

          Show
          Knut Anders Hatlen added a comment - I think the problem here is that code on a higher level check that the statistics exist before calling TableDescriptor.selectivityForConglomerate() and assumes that the statistics exist in that method. However, there are no locks or other mechanisms that prevent the statistics from being dropped between the check on the higher level and the call to selectivityForConglomerate(). Since selectivityForConglomerate() knows how to handle missing statistics in a reasonable way (which it does in production builds), I think we should just remove the asserts here.
          Hide
          Knut Anders Hatlen added a comment -

          Uploading a patch that adds the repro as a JUnit test case in UpdateStatisticsTest. The test case fails fairly consistently in my environment. It has failed in every run so far, either in embedded mode or in client mode, but sometimes not in both modes.

          The test case is disabled for now. To try it, remove the "disabled_" prefix from the name of the test method.

          Committed revision 1087636.

          Show
          Knut Anders Hatlen added a comment - Uploading a patch that adds the repro as a JUnit test case in UpdateStatisticsTest. The test case fails fairly consistently in my environment. It has failed in every run so far, either in embedded mode or in client mode, but sometimes not in both modes. The test case is disabled for now. To try it, remove the "disabled_" prefix from the name of the test method. Committed revision 1087636.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a standalone repro for this bug. If you run the D5153 class with a sane build of Derby, you should see the assert failure fairly quickly.

          The repro starts two threads. One that repeatedly performs a join using a table with a multi-column index, and one that repeatedly updates the index statistics.

          Show
          Knut Anders Hatlen added a comment - Attaching a standalone repro for this bug. If you run the D5153 class with a sane build of Derby, you should see the assert failure fairly quickly. The repro starts two threads. One that repeatedly performs a join using a table with a multi-column index, and one that repeatedly updates the index statistics.
          Hide
          Knut Anders Hatlen added a comment -

          A similar error was logged as DERBY-5169. That one was seen on 10.7.1.1, so the problem does not seem to be related to, or at least not limited to, the new index statistics daemon.

          Show
          Knut Anders Hatlen added a comment - A similar error was logged as DERBY-5169 . That one was seen on 10.7.1.1, so the problem does not seem to be related to, or at least not limited to, the new index statistics daemon.

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Dag H. Wanvik
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development