Derby
  1. Derby
  2. DERBY-3009

Out of memory error when creating a very large table

    Details

    • Urgency:
      Normal
    • Issue & fix info:
      Repro attached
    • Bug behavior facts:
      Crash

      Description

      When creating an extremely large table (c.50 indexes, c.50 FK constraints), IJ crashes with an out of memory error. The table can be created successfully if it is done in stages, each one in a different IJ session.

      From Kristian Waagan:

      "With default settings on my machine, I also get the OOME.
      A brief investigation revealed a few things:

      1) The OOME occurs during constraint additions (with ALTER TABLE ...
      ADD CONSTRAINT). I could observe this by monitoring the heap usage.

      2) The complete script can be run by increasing the heap size. I tried with 256 MB, but the monitoring showed usage peaked at around 150 MB.

      3) The stack traces produced when the OOME occurs varies (as could be expected).

      4) It is the Derby engine that "produce" the OOME, not ij (i.e. when I ran with the network server, the server failed).

      I have not had time to examine the heap content, but I do believe there is a bug in Derby. It seems some resource is not freed after use."

      1. derby-3009_10_5_diff.txt
        2 kB
        Kathey Marsden
      2. DERBY-3009.zip
        64 kB
        Nick Williamson
      3. derby-3009-1a.diff
        2 kB
        Knut Anders Hatlen
      4. derby-3009-1b.diff
        6 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Nick Williamson added a comment -

          If you create a new database and run derby-3009.sql in an IJ session running on default values, you will experience the out-of-memory error.

          Show
          Nick Williamson added a comment - If you create a new database and run derby-3009.sql in an IJ session running on default values, you will experience the out-of-memory error.
          Hide
          Daniel John Debrunner added a comment -

          May not be related but in addressing DERBY-3008 it seemed that even when a table is being created CreateIndexConstantAction scans the table and sets up a sorter to populate the index. Of course when creating a table there will be no rows so it's wasted work. In this case with up to 100 indexes maybe that's a factor.
          I didn't fully check that CreatetableConstantAction does scan the table & sort when creating the table, it was just a quick glance at the code.

          Show
          Daniel John Debrunner added a comment - May not be related but in addressing DERBY-3008 it seemed that even when a table is being created CreateIndexConstantAction scans the table and sets up a sorter to populate the index. Of course when creating a table there will be no rows so it's wasted work. In this case with up to 100 indexes maybe that's a factor. I didn't fully check that CreatetableConstantAction does scan the table & sort when creating the table, it was just a quick glance at the code.
          Hide
          Andrew Brown added a comment -

          I have run accross this issue also and I narrowed it down to index building. I have a few tables with 10-30 million records and when building indexes on them I can watch the memory used grow until it crashes. The only way around this for me has been to restart Derby after each index is built (not really a good thing is a production environment). This happens both in IJ and through a java application. We changed the java code to commit and close the connection after each index build and that seemed to help, but the problem would still manifest itself.

          I played around with some of the memory settings and by setting derby.storage.pageSize to a bigger size than the default size, this just caused the crash to happen faster. I am not a Java developer but it seems that once an index is built, the buffer still has a lock on the memory and it isn't being freed.

          Show
          Andrew Brown added a comment - I have run accross this issue also and I narrowed it down to index building. I have a few tables with 10-30 million records and when building indexes on them I can watch the memory used grow until it crashes. The only way around this for me has been to restart Derby after each index is built (not really a good thing is a production environment). This happens both in IJ and through a java application. We changed the java code to commit and close the connection after each index build and that seemed to help, but the problem would still manifest itself. I played around with some of the memory settings and by setting derby.storage.pageSize to a bigger size than the default size, this just caused the crash to happen faster. I am not a Java developer but it seems that once an index is built, the buffer still has a lock on the memory and it isn't being freed.
          Hide
          Nick Williamson added a comment -

          Thanks for that, Andrew. It must be a different thing in my case, as my
          tables are empty; it's a big (500+ tables) schema with one particularly
          large and complex table, and it seems to be too much for Derby to handle
          in one go. I guess there's some generic weakness in Derby that holds
          onto resources when processing DDL, and any number of things can trigger
          it...

          Regards,
          Nick

          Show
          Nick Williamson added a comment - Thanks for that, Andrew. It must be a different thing in my case, as my tables are empty; it's a big (500+ tables) schema with one particularly large and complex table, and it seems to be too much for Derby to handle in one go. I guess there's some generic weakness in Derby that holds onto resources when processing DDL, and any number of things can trigger it... Regards, Nick
          Hide
          Gerald Khin added a comment -

          I just came across the same effect as Andrew Brown mentioned in his comment: A couple of ALTER TABLE ADD CONSTRAINT FOREIGN KEY statements on a couple of non-empty tables (the biggest of about 150k rows) caused an OOME. And the OOME doesn't happen when restarting the database process before each ALTER TABLE statement.

          But it seems to me that this effect doesn't match the description of this JIRA entry. So my question is: Is this effect already known and honoured in a separate JIRA entry?

          Show
          Gerald Khin added a comment - I just came across the same effect as Andrew Brown mentioned in his comment: A couple of ALTER TABLE ADD CONSTRAINT FOREIGN KEY statements on a couple of non-empty tables (the biggest of about 150k rows) caused an OOME. And the OOME doesn't happen when restarting the database process before each ALTER TABLE statement. But it seems to me that this effect doesn't match the description of this JIRA entry. So my question is: Is this effect already known and honoured in a separate JIRA entry?
          Hide
          Nathan Boy added a comment - - edited

          I have this problem as well, using both Derby 10.5.1.1 and 10.4.2.0 in an embedded client. I have a schema of about 16 tables, a few of which generally have 200-300k rows. All of the data is loaded in, and then foreign key constraints are added one by one. I tried committing between each ADD CONSTRAINT statement, but this did not seem to have any effect. I still run out of memory even when heap size is set to 2-3 gb. I have not tried shutting down and starting up the database between each add constraint statement. I will try this next.

          Show
          Nathan Boy added a comment - - edited I have this problem as well, using both Derby 10.5.1.1 and 10.4.2.0 in an embedded client. I have a schema of about 16 tables, a few of which generally have 200-300k rows. All of the data is loaded in, and then foreign key constraints are added one by one. I tried committing between each ADD CONSTRAINT statement, but this did not seem to have any effect. I still run out of memory even when heap size is set to 2-3 gb. I have not tried shutting down and starting up the database between each add constraint statement. I will try this next.
          Hide
          Tim Halloran added a comment -

          Also seeing what appears to be this problem on Derby 10.5.1.1 (on Windows I can't get the heap up above 1.5 GB-ish)

          Show
          Tim Halloran added a comment - Also seeing what appears to be this problem on Derby 10.5.1.1 (on Windows I can't get the heap up above 1.5 GB-ish)
          Hide
          Rick Hillegas added a comment -

          Triaged for 10.5.2: noted that repro is available.

          Show
          Rick Hillegas added a comment - Triaged for 10.5.2: noted that repro is available.
          Hide
          Kathey Marsden added a comment -

          Marking urgency normal. This is not a regression won't make it for 10.5.2.

          Show
          Kathey Marsden added a comment - Marking urgency normal. This is not a regression won't make it for 10.5.2.
          Hide
          Christian Stolz added a comment -

          Also seeing what appears to be this problem on Derby 10.7.1.1
          Any progress on this topic?

          Show
          Christian Stolz added a comment - Also seeing what appears to be this problem on Derby 10.7.1.1 Any progress on this topic?
          Hide
          Brett Wooldridge added a comment -

          I am seeing the same issue as Nathan Boy commented on in May 2009. Performing an ALTER TABLE with ADD CONSTRAINT on a large table causes an OOM error. Unless this issue is fixed, it will be impossible for us to upgrade customers' databases in the field.

          Show
          Brett Wooldridge added a comment - I am seeing the same issue as Nathan Boy commented on in May 2009. Performing an ALTER TABLE with ADD CONSTRAINT on a large table causes an OOM error. Unless this issue is fixed, it will be impossible for us to upgrade customers' databases in the field.
          Hide
          Knut Anders Hatlen added a comment -

          Marking the issue as a crash since the memory leak may take down the database.

          Show
          Knut Anders Hatlen added a comment - Marking the issue as a crash since the memory leak may take down the database.
          Hide
          Knut Anders Hatlen added a comment -

          The problem appears to be that references to table descriptors are leaked by AlterTableConstantAction (via the fields td and activation). The AlterTableConstantAction object isn't eligible for garbage collection until the corresponding statement has been evicted from the statement cache. The table descriptors may be of considerable because of all the constraints.

          The attached patch (derby-3009-1a.diff) clears these fields once the constant action has been executed. With the patch, I could run the repro with 16 MB heap space without seeing out of memory errors.

          Show
          Knut Anders Hatlen added a comment - The problem appears to be that references to table descriptors are leaked by AlterTableConstantAction (via the fields td and activation). The AlterTableConstantAction object isn't eligible for garbage collection until the corresponding statement has been evicted from the statement cache. The table descriptors may be of considerable because of all the constraints. The attached patch (derby-3009-1a.diff) clears these fields once the constant action has been executed. With the patch, I could run the repro with 16 MB heap space without seeing out of memory errors.
          Hide
          Knut Anders Hatlen added a comment -

          Uploading an updated patch (1b) which adds a test case for this bug. The test case was added to the lowmem test suite, which limits the heap size to 16 MB. To test it, run "ant junit-lowmem". It fails without the fix and passes when the fix is applied.

          All the regression tests ran cleanly with the fix.

          Show
          Knut Anders Hatlen added a comment - Uploading an updated patch (1b) which adds a test case for this bug. The test case was added to the lowmem test suite, which limits the heap size to 16 MB. To test it, run "ant junit-lowmem". It fails without the fix and passes when the fix is applied. All the regression tests ran cleanly with the fix.
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 1086527.

          Show
          Knut Anders Hatlen added a comment - Committed revision 1086527.
          Hide
          Dag H. Wanvik added a comment -

          Fix looks safe to me. I verified that without the code changes, the new lowmem test case fails.
          +1

          Show
          Dag H. Wanvik added a comment - Fix looks safe to me. I verified that without the code changes, the new lowmem test case fails. +1
          Hide
          Lily Wei added a comment -

          +1 for the fix. When I verified without code changes on windows 7, the new lowmem test case did not fail for me. I have to add to tables=200 and column=200 and run it with -Dderby.storage.pageCacheSize=4M. The test failed in the case above for me. However, I couldn't run ant lowmem. Either case, the test verified the fix is fixing memory leak issue. Thanks Knut.

          Show
          Lily Wei added a comment - +1 for the fix. When I verified without code changes on windows 7, the new lowmem test case did not fail for me. I have to add to tables=200 and column=200 and run it with -Dderby.storage.pageCacheSize=4M. The test failed in the case above for me. However, I couldn't run ant lowmem. Either case, the test verified the fix is fixing memory leak issue. Thanks Knut.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for testing the patch, Dag and Lily.

          Lily, if you run the new test case outside of the lowmem suite, you need to invoke JUnit with -Xmx16M in order to see the OOME. The ant target junit-lowmem adds that JVM argument automatically.

          Show
          Knut Anders Hatlen added a comment - Thanks for testing the patch, Dag and Lily. Lily, if you run the new test case outside of the lowmem suite, you need to invoke JUnit with -Xmx16M in order to see the OOME. The ant target junit-lowmem adds that JVM argument automatically.
          Hide
          Kathey Marsden added a comment -

          Reopen for 10.5 backport consideration. If working on the backport for this issue. Temporarily assign yourself and add a comment that you are working on it. When finished, reresolve with the new fix versions or label backport_reject_10_x as appropriate.

          Show
          Kathey Marsden added a comment - Reopen for 10.5 backport consideration. If working on the backport for this issue. Temporarily assign yourself and add a comment that you are working on it. When finished, reresolve with the new fix versions or label backport_reject_10_x as appropriate.
          Hide
          Kathey Marsden added a comment -

          Assigning to myself for backport to 10.5

          Show
          Kathey Marsden added a comment - Assigning to myself for backport to 10.5
          Hide
          Kathey Marsden added a comment -

          10.5 had to be merged manually. Attaching patch derby-3009_10_5_diff.txt

          Show
          Kathey Marsden added a comment - 10.5 had to be merged manually. Attaching patch derby-3009_10_5_diff.txt
          Hide
          Kathey Marsden added a comment -

          Completed merge back to 10.5. Assigning back to Knut and resolving.

          Show
          Kathey Marsden added a comment - Completed merge back to 10.5. Assigning back to Knut and resolving.
          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Nick Williamson
            • Votes:
              6 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development