Derby
  1. Derby
  2. DERBY-4881

Deadlock accessing SYS.SYSSTATISTICS

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.3.3.0, 10.4.2.0, 10.5.3.0, 10.6.2.1, 10.7.1.1
    • Fix Version/s: 10.5.3.2, 10.6.2.4, 10.7.1.1
    • Component/s: SQL
    • Labels:
      None

      Description

      Transactions accessing index statistics can deadlock if one of them inserts new entries and the other selects from the system table. Inserts happens for instance when update of index statistics are perform manually, or when a table is compressed (given that the table has indexes and contains some rows). This issue may be more problematic when automatic update of index statistics is implemented.
      Issue discovered when writing a regression tests for DERBY-4849, see discussion there. The bug is timing dependent, but has been observed on a variety of JVMs and platform architectures.

      To sum up:
      o using NO_WAIT + retry was suggested, but turned out to be an infeasible solution
      o current approach is to allow using read uncommitted isolation level when accessing statistics in the system table (take no locks)

      1. derby-4881-1b-deadlock_fix.diff
        6 kB
        Kristian Waagan
      2. derby-4881-1a-deadlock_fix.diff
        7 kB
        Kristian Waagan
      3. derby-4881_10_5_diff.txt
        6 kB
        Kathey Marsden

        Issue Links

          Activity

          Hide
          Kristian Waagan added a comment -

          Attached patch 1a.

            • FromBaseTable
              Minor change, avoid second call to TableDescriptor.statisticsExist if unnecessary.
            • DataDictionaryImpl
              o Removed getDescriptorViaIndex-overload not taking isolation level as parameter (only used in once place).
              o getStatisticsDescriptors:
          • new JavaDoc
          • use READ_UNCOMMITTED instead of REPEATABLE_READ
            o @@ -8455,16 +8466,16 @@: formatting change due to long line
            o getDescriptorViaIndexMinion
          • removed unused variable indexTemplateRow
          • removed assert allowing only a single result to be returned when using isolation level read uncommitted
          • modified if to ignore match if tuple descriptor is null (index row existed, but not base row) <=== Behavior change
            o computeSequenceRowLocation (cleanup only)
          • removed unused variable row
          • use getDescriptorViaIndex instead of getDescriptorViaIndexMinion

          Regression tests passed: derbyall (104 tests) and suites.All (13075 tests).
          Patch ready for review.

          Show
          Kristian Waagan added a comment - Attached patch 1a. FromBaseTable Minor change, avoid second call to TableDescriptor.statisticsExist if unnecessary. DataDictionaryImpl o Removed getDescriptorViaIndex-overload not taking isolation level as parameter (only used in once place). o getStatisticsDescriptors: new JavaDoc use READ_UNCOMMITTED instead of REPEATABLE_READ o @@ -8455,16 +8466,16 @@: formatting change due to long line o getDescriptorViaIndexMinion removed unused variable indexTemplateRow removed assert allowing only a single result to be returned when using isolation level read uncommitted modified if to ignore match if tuple descriptor is null (index row existed, but not base row) <=== Behavior change o computeSequenceRowLocation (cleanup only) removed unused variable row use getDescriptorViaIndex instead of getDescriptorViaIndexMinion Regression tests passed: derbyall (104 tests) and suites.All (13075 tests). Patch ready for review.
          Hide
          Knut Anders Hatlen added a comment -

          The patch looks good to me. And it removes more code than it adds! It looks like Dag handled most of the edges for the read uncommitted case in DERBY-3678, and with your change to ignore td==null, I think it should be safe in the list!=null case too.

          I was able to reproduce the deadlock in XplainStatisticsTest on nearly every run with an insane build in my environment. The patch seems to have fixed that, and now I'm not able to reproduce it.

          +1 to commit.

          Show
          Knut Anders Hatlen added a comment - The patch looks good to me. And it removes more code than it adds! It looks like Dag handled most of the edges for the read uncommitted case in DERBY-3678 , and with your change to ignore td==null, I think it should be safe in the list!=null case too. I was able to reproduce the deadlock in XplainStatisticsTest on nearly every run with an insane build in my environment. The patch seems to have fixed that, and now I'm not able to reproduce it. +1 to commit.
          Hide
          Kristian Waagan added a comment -

          Thanks for the quick review, Knut Anders

          Patch 1b is identical to 1a, except that I ditched the changes in FromBaseTable. Avoiding the second call to statisticsExists is moot because of caching and short-circuiting.

          Committed patch 1b to trunk with revision 1030043.

          Show
          Kristian Waagan added a comment - Thanks for the quick review, Knut Anders Patch 1b is identical to 1a, except that I ditched the changes in FromBaseTable. Avoiding the second call to statisticsExists is moot because of caching and short-circuiting. Committed patch 1b to trunk with revision 1030043.
          Hide
          Kristian Waagan added a comment -

          I will backport this fix once the nightlies show that it hasn't introduced any new problems, and that the regression test for DERBY-4849 passes.

          Show
          Kristian Waagan added a comment - I will backport this fix once the nightlies show that it hasn't introduced any new problems, and that the regression test for DERBY-4849 passes.
          Hide
          Kristian Waagan added a comment -

          The fix only merges cleanly with 10.6, and it seems it would be required to either rewrite the patch or pull in additional changes to backport it to 10.5.
          Given that we haven't gotten any user reports of this issue, I plan to backport it to 10.6. only.

          Running tests.

          Show
          Kristian Waagan added a comment - The fix only merges cleanly with 10.6, and it seems it would be required to either rewrite the patch or pull in additional changes to backport it to 10.5. Given that we haven't gotten any user reports of this issue, I plan to backport it to 10.6. only. Running tests.
          Hide
          Kristian Waagan added a comment -

          Tests passed.

          Backported to the 10.6 branch with revision 1031623.

          Show
          Kristian Waagan added a comment - Tests passed. Backported to the 10.6 branch with revision 1031623.
          Hide
          Kristian Waagan added a comment -

          Closing issue.

          Show
          Kristian Waagan added a comment - Closing issue.
          Hide
          Kathey Marsden added a comment -

          Reopen for backport.

          Show
          Kathey Marsden added a comment - Reopen for backport.
          Hide
          Kathey Marsden added a comment -

          Assign to myself for backport to 10.5

          Show
          Kathey Marsden added a comment - Assign to myself for backport to 10.5
          Hide
          Kathey Marsden added a comment -

          Some manual merge was required for 10.5. Here is the patch. Running tests.

          Show
          Kathey Marsden added a comment - Some manual merge was required for 10.5. Here is the patch. Running tests.
          Hide
          Kathey Marsden added a comment -

          closing after merge 10.5

          Show
          Kathey Marsden added a comment - closing after merge 10.5

            People

            • Assignee:
              Kristian Waagan
              Reporter:
              Kristian Waagan
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development