Derby
  1. Derby
  2. DERBY-4057

Space is not reclaimed if transaction is rolled back

    Details

    • Urgency:
      Normal

      Description

      If I repeatedly insert into a clob table and rollback the the transaction the space is not reclaimed and the number of allocated pages continues to grow. I will add a test case to ClobReclamationTest and reference this bug.

      DERBY-4056 may be a special case of this bug, but I thought I would file a bug for the general issue.

      1. DERBY-4057_initial_prototype_patch.diff
        62 kB
        Mike Matrigali
      2. DERBY-4057_patch_2.diff
        78 kB
        Mike Matrigali

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment -

          Added test store.ClobReclamationTest.xtestReclamationOnRollback to demonstrate the issue.

          Show
          Kathey Marsden added a comment - Added test store.ClobReclamationTest.xtestReclamationOnRollback to demonstrate the issue.
          Hide
          Knut Anders Hatlen added a comment -

          Triaged for 10.5.2.

          Show
          Knut Anders Hatlen added a comment - Triaged for 10.5.2.
          Hide
          Mike Matrigali added a comment -

          Triaged for 10.9, no changes.

          Show
          Mike Matrigali added a comment - Triaged for 10.9, no changes.
          Hide
          Mike Matrigali added a comment -

          For a first pass at this will look at solving the runtime rollback case. Redo recovery after system crash
          is harder as the recovery happens with only the raw store running, so the normal post commit machinery
          is not available.

          To solve this the raw store needs to "call back" into the higher level Access module to do post commit processing.
          This is similar to what the raw store does during logical recovery, calling back into btree code to do search for a
          row that may have moved pages because of splits or joins.

          I think simplest will be adding an interface to register a class with raw store that it can call when it does an undo of
          an insert. Then building an access level class that can take the information raw store provides, funnel it to the correct
          conglomerate implementation, and do the appropriate work of queueing a post commit action using similar logic
          to what those conglomerates do today for delete.

          Show
          Mike Matrigali added a comment - For a first pass at this will look at solving the runtime rollback case. Redo recovery after system crash is harder as the recovery happens with only the raw store running, so the normal post commit machinery is not available. To solve this the raw store needs to "call back" into the higher level Access module to do post commit processing. This is similar to what the raw store does during logical recovery, calling back into btree code to do search for a row that may have moved pages because of splits or joins. I think simplest will be adding an interface to register a class with raw store that it can call when it does an undo of an insert. Then building an access level class that can take the information raw store provides, funnel it to the correct conglomerate implementation, and do the appropriate work of queueing a post commit action using similar logic to what those conglomerates do today for delete.
          Hide
          Mike Matrigali added a comment -

          Here is patch with initial prototype. Still need to write new tests and pass existing. Any comments on overall approach welcome. Will be submitting another patch with more comments, tests, cleanup - so better to wait for that
          for detail review.

          Show
          Mike Matrigali added a comment - Here is patch with initial prototype. Still need to write new tests and pass existing. Any comments on overall approach welcome. Will be submitting another patch with more comments, tests, cleanup - so better to wait for that for detail review.
          Hide
          Bryan Pendleton added a comment -

          Am I understanding the basic approach correctly: after your change, we will now reclaim
          the space, but it will be slightly delayed; we will defer space reclamation processing
          until a separate phase after the user-level transaction has committed.

          The space reclamation "system" transaction can succeed, or fail, but it will do so
          independently of the user's transaction.

          Show
          Bryan Pendleton added a comment - Am I understanding the basic approach correctly: after your change, we will now reclaim the space, but it will be slightly delayed; we will defer space reclamation processing until a separate phase after the user-level transaction has committed. The space reclamation "system" transaction can succeed, or fail, but it will do so independently of the user's transaction.
          Hide
          Mike Matrigali added a comment - - edited

          I think your understanding is correct. The intent is to make the space reclaiming process of
          an aborted insert to match as closely as possible the existing behavior of space reclamation
          of a committed delete. The current system does nothing for the aborted insert case so that
          space is often not reclaimed unless the user runs one of the manual compress calls.

          In both cases it is necessary to delay any reclaiming of the space on a page until after the
          transaction has successfully committed or aborted. This insures that redo recovery of the
          committed transaction and aborted transactions will always succeed. If we removed the
          space of a delete before commit some other transaction could use that space before we
          committed and then the abort of the delete could find there was not enough space to put
          the data back on the page.

          The work gets queued after commit or abort. Usually this work is then done async to the
          user thread by the raw store daemon. This is often an issue with some intermittent test results across platforms that
          "expect" some amount of space reclamation, but sometimes get different results.

          Show
          Mike Matrigali added a comment - - edited I think your understanding is correct. The intent is to make the space reclaiming process of an aborted insert to match as closely as possible the existing behavior of space reclamation of a committed delete. The current system does nothing for the aborted insert case so that space is often not reclaimed unless the user runs one of the manual compress calls. In both cases it is necessary to delay any reclaiming of the space on a page until after the transaction has successfully committed or aborted. This insures that redo recovery of the committed transaction and aborted transactions will always succeed. If we removed the space of a delete before commit some other transaction could use that space before we committed and then the abort of the delete could find there was not enough space to put the data back on the page. The work gets queued after commit or abort. Usually this work is then done async to the user thread by the raw store daemon. This is often an issue with some intermittent test results across platforms that "expect" some amount of space reclamation, but sometimes get different results.
          Hide
          Mike Matrigali added a comment -

          This patch is an improvement over the previous initial prototype. It
          passes all tests on my machine, ibm17, windows except for testBlobLinkedListReclamationOnRollback. And that test just needs to
          get changed to expect the improved reclamation, after the patch table
          is left with 1 allocated page rather than 1000.

          I expect to post one more patch with comments cleaned up and making sure debugging code is removed.

          The major change in the approach to get a number of tests to pass was
          to change the "in transaction" queueing code that creates the HeapPostCommit to only take a PageKey rather than a Heap. This meant
          the code did not interact with the conglomerate cache and possibly have
          to do things like read data from the container during abort. This is the same info that the raw store ReclaimSpace work item does. This solved
          problems with alter table tests, interrupt tests, and AccessFactory unit tests.

          only test error, which just needs to be changed to accept new behavior:
          1) testBlobLinkedListReclamationOnRollback(org.apache.derbyTesting.functionTests
          .tests.store.ClobReclamationTest)junit.framework.AssertionFailedError: Column va
          lue mismatch @ column 'NUMALLOCATEDPAGES', row 1:
          Expected: >1001<
          Found: >1<

          NUMALLOCATEDPAGES
          -----------------
          [1]

          Show
          Mike Matrigali added a comment - This patch is an improvement over the previous initial prototype. It passes all tests on my machine, ibm17, windows except for testBlobLinkedListReclamationOnRollback. And that test just needs to get changed to expect the improved reclamation, after the patch table is left with 1 allocated page rather than 1000. I expect to post one more patch with comments cleaned up and making sure debugging code is removed. The major change in the approach to get a number of tests to pass was to change the "in transaction" queueing code that creates the HeapPostCommit to only take a PageKey rather than a Heap. This meant the code did not interact with the conglomerate cache and possibly have to do things like read data from the container during abort. This is the same info that the raw store ReclaimSpace work item does. This solved problems with alter table tests, interrupt tests, and AccessFactory unit tests. only test error, which just needs to be changed to accept new behavior: 1) testBlobLinkedListReclamationOnRollback(org.apache.derbyTesting.functionTests .tests.store.ClobReclamationTest)junit.framework.AssertionFailedError: Column va lue mismatch @ column 'NUMALLOCATEDPAGES', row 1: Expected: >1001< Found: >1< NUMALLOCATEDPAGES ----------------- [1]
          Hide
          Bryan Pendleton added a comment -

          Glad to hear the testing is going well. I read through your latest patch, and
          it seems clear and well-documented. I have no improvements to suggest.

          Show
          Bryan Pendleton added a comment - Glad to hear the testing is going well. I read through your latest patch, and it seems clear and well-documented. I have no improvements to suggest.
          Hide
          Kathey Marsden added a comment -

          Mike wrote:
          >NUMALLOCATEDPAGES', row 1:
          >Expected: >1001<
          > Found: >1<

          I just wanted to say Hooray and Thank You Mike! In my time with Derby, I found lack of space reclamation to be the root cause of many support issues. Although an off line compress is usually just a bump in the road, I am excited to see zero admin become more of a reality. DERBY-5473 next!

          Show
          Kathey Marsden added a comment - Mike wrote: >NUMALLOCATEDPAGES', row 1: >Expected: >1001< > Found: >1< I just wanted to say Hooray and Thank You Mike! In my time with Derby, I found lack of space reclamation to be the root cause of many support issues. Although an off line compress is usually just a bump in the road, I am excited to see zero admin become more of a reality. DERBY-5473 next!
          Hide
          Mike Matrigali added a comment -

          The level of changes for this change are more feature like than bug fix, so do not think this should be backported.

          Show
          Mike Matrigali added a comment - The level of changes for this change are more feature like than bug fix, so do not think this should be backported.
          Hide
          Mike Matrigali added a comment -

          Setting issue to improvement. System was not originally designed to reclaim space on aborted inserts, so new functionality had to be added to do so.

          Show
          Mike Matrigali added a comment - Setting issue to improvement. System was not originally designed to reclaim space on aborted inserts, so new functionality had to be added to do so.
          Hide
          ASF subversion and git services added a comment -

          Commit 1641418 from mikem@apache.org in branch 'code/trunk'
          [ https://svn.apache.org/r1641418 ]

          DERBY-4057 Space is not reclaimed if transaction is rolled back

          Added infrastructure called at insert abort time to queue post abort work.
          After the abort work is queued in the case of aborted inserts to reclaim
          space and if possible mark pages free, which then in turn allows them to
          be used by subsequent work on the table. This work queues this work
          on heap tables when the aborted insert is the last row on a page, or
          if the aborted insert is a row that contains a long column (a row that
          is bigger than a page - usually a blob or clob), or a long row (a row
          that spans multiple pages).

          Show
          ASF subversion and git services added a comment - Commit 1641418 from mikem@apache.org in branch 'code/trunk' [ https://svn.apache.org/r1641418 ] DERBY-4057 Space is not reclaimed if transaction is rolled back Added infrastructure called at insert abort time to queue post abort work. After the abort work is queued in the case of aborted inserts to reclaim space and if possible mark pages free, which then in turn allows them to be used by subsequent work on the table. This work queues this work on heap tables when the aborted insert is the last row on a page, or if the aborted insert is a row that contains a long column (a row that is bigger than a page - usually a blob or clob), or a long row (a row that spans multiple pages).
          Hide
          Myrna van Lunteren added a comment -

          It seems this check-in had a few javadoc warnings as per the 'Derby-trunk' Jenkins target:
          [javadoc] /home/jenkins/jenkins-slave/workspace/Derby-trunk/trunk/java/engine/org/apache/derby/impl/store/raw/RawStore.java:429: warning - @param argument "undo_handle" is not a parameter name.
          [javadoc] /home/jenkins/jenkins-slave/workspace/Derby-trunk/trunk/java/engine/org/apache/derby/iapi/store/raw/data/DataFactory.java:432: warning - @param argument "undo_handler" is not a parameter name.
          [javadoc] /home/jenkins/jenkins-slave/workspace/Derby-trunk/trunk/java/engine/org/apache/derby/impl/store/raw/data/BasePage.java:357: warning - Tag @see: can't find getPageIdentity in org.apache.derby.iapi.store.raw.Page
          [javadoc] /home/jenkins/jenkins-slave/workspace/Derby-trunk/trunk/java/engine/org/apache/derby/impl/store/raw/data/BaseDataFileFactory.java:1394: warning - @param argument "undo_handle" is not a parameter name.

          See: https://builds.apache.org/view/A-D/view/Derby/job/Derby-trunk/lastBuild/consoleFull

          Show
          Myrna van Lunteren added a comment - It seems this check-in had a few javadoc warnings as per the 'Derby-trunk' Jenkins target: [javadoc] /home/jenkins/jenkins-slave/workspace/Derby-trunk/trunk/java/engine/org/apache/derby/impl/store/raw/RawStore.java:429: warning - @param argument "undo_handle" is not a parameter name. [javadoc] /home/jenkins/jenkins-slave/workspace/Derby-trunk/trunk/java/engine/org/apache/derby/iapi/store/raw/data/DataFactory.java:432: warning - @param argument "undo_handler" is not a parameter name. [javadoc] /home/jenkins/jenkins-slave/workspace/Derby-trunk/trunk/java/engine/org/apache/derby/impl/store/raw/data/BasePage.java:357: warning - Tag @see: can't find getPageIdentity in org.apache.derby.iapi.store.raw.Page [javadoc] /home/jenkins/jenkins-slave/workspace/Derby-trunk/trunk/java/engine/org/apache/derby/impl/store/raw/data/BaseDataFileFactory.java:1394: warning - @param argument "undo_handle" is not a parameter name. See: https://builds.apache.org/view/A-D/view/Derby/job/Derby-trunk/lastBuild/consoleFull
          Hide
          ASF subversion and git services added a comment -

          Commit 1641521 from mikem@apache.org in branch 'code/trunk'
          [ https://svn.apache.org/r1641521 ]

          DERBY-4057 Space is not reclaimed if transaction is rolled back

          javadoc fixes for previous checkin.

          Show
          ASF subversion and git services added a comment - Commit 1641521 from mikem@apache.org in branch 'code/trunk' [ https://svn.apache.org/r1641521 ] DERBY-4057 Space is not reclaimed if transaction is rolled back javadoc fixes for previous checkin.
          Hide
          ASF subversion and git services added a comment -

          Commit 1641526 from mikem@apache.org in branch 'code/trunk'
          [ https://svn.apache.org/r1641526 ]

          DERBY-4057 Space is not reclaimed if transaction is rolled back

          a few changes to make the ClobReclamationTest better behaved across a variety
          of platforms. The test tries to measure background thread behavior, and needs
          to allow for machines/jvms that might be single threaded and slow.

          In one case added a sleep after all aborts to allow the background thread
          to catch up and mark all pages but head one free.

          In other case added more to the fudge factor of how much immediate insert
          might progress before background thread can catch up and make free page
          available. If this does not work may just need to add sleep after every
          insert.

          Show
          ASF subversion and git services added a comment - Commit 1641526 from mikem@apache.org in branch 'code/trunk' [ https://svn.apache.org/r1641526 ] DERBY-4057 Space is not reclaimed if transaction is rolled back a few changes to make the ClobReclamationTest better behaved across a variety of platforms. The test tries to measure background thread behavior, and needs to allow for machines/jvms that might be single threaded and slow. In one case added a sleep after all aborts to allow the background thread to catch up and mark all pages but head one free. In other case added more to the fudge factor of how much immediate insert might progress before background thread can catch up and make free page available. If this does not work may just need to add sleep after every insert.
          Hide
          ASF subversion and git services added a comment -

          Commit 1641731 from mikem@apache.org in branch 'code/trunk'
          [ https://svn.apache.org/r1641731 ]

          DERBY-4057 Space is not reclaimed if transaction is rolled back

          more tweaking to get tests to run successfully across all platforms.

          Show
          ASF subversion and git services added a comment - Commit 1641731 from mikem@apache.org in branch 'code/trunk' [ https://svn.apache.org/r1641731 ] DERBY-4057 Space is not reclaimed if transaction is rolled back more tweaking to get tests to run successfully across all platforms.
          Hide
          ASF subversion and git services added a comment -

          Commit 1641753 from mikem@apache.org in branch 'code/trunk'
          [ https://svn.apache.org/r1641753 ]

          DERBY-6774 org.apache.derbyTesting.functionTests.tests.lang.AlterTableTest.testAddIdentityColumn failed with assert in nightlys

          Temporarily removing assert that is failing, looks like code should handle
          the condition anyway. Will renable after figuring out what is going on,
          hoping this will allow for clean runs from others while I work on this issue.
          My current theory is that there is a long time problem with alter table
          and the conglomerate cache that has been uncovered by this relatively new
          test and the new background work introduced by DERBY-4057.

          Show
          ASF subversion and git services added a comment - Commit 1641753 from mikem@apache.org in branch 'code/trunk' [ https://svn.apache.org/r1641753 ] DERBY-6774 org.apache.derbyTesting.functionTests.tests.lang.AlterTableTest.testAddIdentityColumn failed with assert in nightlys Temporarily removing assert that is failing, looks like code should handle the condition anyway. Will renable after figuring out what is going on, hoping this will allow for clean runs from others while I work on this issue. My current theory is that there is a long time problem with alter table and the conglomerate cache that has been uncovered by this relatively new test and the new background work introduced by DERBY-4057 .
          Hide
          ASF subversion and git services added a comment -

          Commit 1643463 from mikem@apache.org in branch 'code/trunk'
          [ https://svn.apache.org/r1643463 ]

          DERBY-4057

          Removing one check in test that is failing across platforms. Separate
          work under DERBY-6775 will improve the test in this area.

          Show
          ASF subversion and git services added a comment - Commit 1643463 from mikem@apache.org in branch 'code/trunk' [ https://svn.apache.org/r1643463 ] DERBY-4057 Removing one check in test that is failing across platforms. Separate work under DERBY-6775 will improve the test in this area.
          Hide
          ASF subversion and git services added a comment -

          Commit 1644145 from mikem@apache.org in branch 'code/trunk'
          [ https://svn.apache.org/r1644145 ]

          DERBY-4057 Space is not reclaimed if transaction is rolled back

          Another change to try and get test to behave well across all platforms.
          Still encountering some issues that I believe are because background work
          piles up on slow machines.

          Trying to address errors encountered in nightly tests:
          http://people.apache.org/~myrnavl/derby_test_results/main/windows/testlog/ibm16/1643981-suites.All_diff.txt
          http://download.java.net/javadesktop/derby/request_5600553/javadb-task-3984530.html

          Show
          ASF subversion and git services added a comment - Commit 1644145 from mikem@apache.org in branch 'code/trunk' [ https://svn.apache.org/r1644145 ] DERBY-4057 Space is not reclaimed if transaction is rolled back Another change to try and get test to behave well across all platforms. Still encountering some issues that I believe are because background work piles up on slow machines. Trying to address errors encountered in nightly tests: http://people.apache.org/~myrnavl/derby_test_results/main/windows/testlog/ibm16/1643981-suites.All_diff.txt http://download.java.net/javadesktop/derby/request_5600553/javadb-task-3984530.html

            People

            • Assignee:
              Mike Matrigali
              Reporter:
              Kathey Marsden
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development