Derby
  1. Derby
  2. DERBY-6096

OutOfMemoryError with Clob or Blob hash join: DataTypeDescriptor.estimatedMemoryUsage() has no case for BLOB or CLOB so would underestimate memory usage for those types at zero

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.1.3.1, 10.2.2.0, 10.3.3.0, 10.4.2.0, 10.5.3.0, 10.6.2.1, 10.7.1.1, 10.8.3.0, 10.9.1.0, 10.10.1.1
    • Fix Version/s: 10.11.1.1
    • Component/s: SQL

      Description

      In discussion on derby-dev regarding how much memory is used for hash joins, Knut noted:

      I haven't verified, but I think HashJoinStrategy uses
      DataTypeDescriptor.estimatedMemoryUsage() to estimate how much memory
      the hash table will consume. That method has no case for BLOB or CLOB,
      so it looks as if it will return zero for LOB columns. If that's so, it
      will definitely overestimate how many rows fits in maxMemoryPerTable
      kilobytes if the rows contain LOBs.

      DataTypeDescriptor.estimatedMemoryUsage() should be updated to include BLOB and CLOB and we should try verify if this theory is correct with a reproduction.

      1. releaseNote.html
        2 kB
        Rick Hillegas
      2. less-gc.diff
        2 kB
        Knut Anders Hatlen
      3. derby-6096_diff2.txt
        6 kB
        Kathey Marsden
      4. releaseNote.html
        4 kB
        Kathey Marsden
      5. derby-6096_diff.txt
        6 kB
        Kathey Marsden
      6. derby-6096_code_diff.txt
        0.9 kB
        Kathey Marsden
      7. D6096.java
        0.8 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment -

          I am out today but thought I would post where I was on trying to get a reproduction for the memory usage with Clob hash joins. I created this fixture in memory.ClobMemTest. At one point I was getting an OOM on the query if derby.language.maxMemoryPerTable wasn't set running -Xmx64M but then started cleaning up and it no longer occurs. I will look more closely tomorrow but just wanted to post where I am and get input on how to reproduce.

          public void testClobHashJoin() throws SQLException {
          Statement s = createStatement();
          try {
          // Setting maxMemoryPerTable to 0 allows the query to complete
          // until OOM is fixed.
          //println("setSystemProperty(\"derby.language.maxMemoryPerTable\", \"0\")");
          //setSystemProperty("derby.language.maxMemoryPerTable", "0");

          s.executeUpdate("CREATE TABLE T1 (ID INT , NAME VARCHAR(30))");
          s.executeUpdate("CREATE TABLE T2 (ID INT , CDATA CLOB(1G))");
          PreparedStatement ps = prepareStatement("insert into t1 values(?,?)");
          PreparedStatement ps2 = prepareStatement("insert into t2 values(?,?)");
          // insert 8 long rows
          for (int i = 1; i <= 8; i++)

          { ps.setInt(1, i); ps.setString(2, "name" + i); ps.executeUpdate(); ps2.setInt(1, i); ps2.setCharacterStream(2, new LoopingAlphabetReader( LONG_CLOB_LENGTH), LONG_CLOB_LENGTH); ps2.executeUpdate(); }

          s.execute("CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)");
          // Do a query. Force a hash join
          PreparedStatement ps3 = prepareStatement("SELECT * FROM t1, t2 --DERBY-PROPERTIES joinStrategy=hash\n"
          + "where t1.id = t2.id AND t1.id < 8 ");
          ResultSet rs = ps3.executeQuery();
          int i = 0;
          for (; rs.next(); i++)

          { // just fetch don't materialize results // derby.tests.trace prints memory usage println("TotalMemory:" + Runtime.getRuntime().totalMemory() + " " + "Free Memory:" + Runtime.getRuntime().freeMemory()); }

          assertEquals("Expected 7 rows, got + i", 7, i);
          rs.close();
          RuntimeStatisticsParser p = SQLUtilities
          .getRuntimeStatisticsParser(s);
          println(p.toString());
          assertTrue(p.usedHashJoin());

          } finally

          { removeSystemProperty("derby.language.maxMemoryPerTable"); s.execute("CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(0)"); }

          }

          Show
          Kathey Marsden added a comment - I am out today but thought I would post where I was on trying to get a reproduction for the memory usage with Clob hash joins. I created this fixture in memory.ClobMemTest. At one point I was getting an OOM on the query if derby.language.maxMemoryPerTable wasn't set running -Xmx64M but then started cleaning up and it no longer occurs. I will look more closely tomorrow but just wanted to post where I am and get input on how to reproduce. public void testClobHashJoin() throws SQLException { Statement s = createStatement(); try { // Setting maxMemoryPerTable to 0 allows the query to complete // until OOM is fixed. //println("setSystemProperty(\"derby.language.maxMemoryPerTable\", \"0\")"); //setSystemProperty("derby.language.maxMemoryPerTable", "0"); s.executeUpdate("CREATE TABLE T1 (ID INT , NAME VARCHAR(30))"); s.executeUpdate("CREATE TABLE T2 (ID INT , CDATA CLOB(1G))"); PreparedStatement ps = prepareStatement("insert into t1 values(?,?)"); PreparedStatement ps2 = prepareStatement("insert into t2 values(?,?)"); // insert 8 long rows for (int i = 1; i <= 8; i++) { ps.setInt(1, i); ps.setString(2, "name" + i); ps.executeUpdate(); ps2.setInt(1, i); ps2.setCharacterStream(2, new LoopingAlphabetReader( LONG_CLOB_LENGTH), LONG_CLOB_LENGTH); ps2.executeUpdate(); } s.execute("CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)"); // Do a query. Force a hash join PreparedStatement ps3 = prepareStatement("SELECT * FROM t1, t2 --DERBY-PROPERTIES joinStrategy=hash\n" + "where t1.id = t2.id AND t1.id < 8 "); ResultSet rs = ps3.executeQuery(); int i = 0; for (; rs.next(); i++) { // just fetch don't materialize results // derby.tests.trace prints memory usage println("TotalMemory:" + Runtime.getRuntime().totalMemory() + " " + "Free Memory:" + Runtime.getRuntime().freeMemory()); } assertEquals("Expected 7 rows, got + i", 7, i); rs.close(); RuntimeStatisticsParser p = SQLUtilities .getRuntimeStatisticsParser(s); println(p.toString()); assertTrue(p.usedHashJoin()); } finally { removeSystemProperty("derby.language.maxMemoryPerTable"); s.execute("CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(0)"); } }
          Hide
          Knut Anders Hatlen added a comment -

          You may have better luck with smaller LOBs. LONG_CLOB_LENGTH is 18000000, which means the SQLClob objects inserted into BackingStoreHashtable aren't materialized and don't take up that much space. Using a larger number of smaller LOBs (so small that they don't overflow to another page) should increase the memory footprint, as those LOBs will come out of store fully materialized.

          Show
          Knut Anders Hatlen added a comment - You may have better luck with smaller LOBs. LONG_CLOB_LENGTH is 18000000, which means the SQLClob objects inserted into BackingStoreHashtable aren't materialized and don't take up that much space. Using a larger number of smaller LOBs (so small that they don't overflow to another page) should increase the memory footprint, as those LOBs will come out of store fully materialized.
          Hide
          Knut Anders Hatlen added a comment -

          The attached program, D6096.java, produces an OOME in BackingStoreHashtable during a join when I run it with -Xmx64M.

          The program inserts 1500 32KB BLOBs into a table and joins the table with itself.

          Show
          Knut Anders Hatlen added a comment - The attached program, D6096.java, produces an OOME in BackingStoreHashtable during a join when I run it with -Xmx64M. The program inserts 1500 32KB BLOBs into a table and joins the table with itself.
          Hide
          Kathey Marsden added a comment -

          Thank you Knut.I was able to reproduce the OOM with your program. Also a quick hack of DataTypeDescriptor.estimatedMemoryUsage() rectifies the problem and verifies your original theory. Updating the summary accordingly.

          Show
          Kathey Marsden added a comment - Thank you Knut.I was able to reproduce the OOM with your program. Also a quick hack of DataTypeDescriptor.estimatedMemoryUsage() rectifies the problem and verifies your original theory. Updating the summary accordingly.
          Hide
          Mike Matrigali added a comment -

          what estimate is proposed for estimatedMemoryUsage of blob's clob's? Since they are variable length objects it is hard to know what actual size they are.
          Given your repro program, it might be reasonable to use 32k and assume store will stream the rest of each. definitely better than 0.

          Note that with the fix we may use way less memory for the query, and for some like your repro that will be good. But for others that did not see errors their
          queries may run slower now after the fix when we use less memory. This may especially be a concern if the fix is to be backported.

          Show
          Mike Matrigali added a comment - what estimate is proposed for estimatedMemoryUsage of blob's clob's? Since they are variable length objects it is hard to know what actual size they are. Given your repro program, it might be reasonable to use 32k and assume store will stream the rest of each. definitely better than 0. Note that with the fix we may use way less memory for the query, and for some like your repro that will be good. But for others that did not see errors their queries may run slower now after the fix when we use less memory. This may especially be a concern if the fix is to be backported.
          Hide
          Kathey Marsden added a comment -

          My thought was to just match the existing entries for LONGVARCHAR_TYPE_ID LONGVARBIT_TYPE_ID
          which is:

          /* Who knows? Let's just use some big number */
          return 10000.0;

          I see the concern with backporting. Maybe the fix should just go into 10.10 along with a release note with a work around for performance issues of setting derby.language.maxMemoryPerTable higher. Maybe documenting derby.language.maxMemoryPerTable would be good at the same time.

          Show
          Kathey Marsden added a comment - My thought was to just match the existing entries for LONGVARCHAR_TYPE_ID LONGVARBIT_TYPE_ID which is: /* Who knows? Let's just use some big number */ return 10000.0; I see the concern with backporting. Maybe the fix should just go into 10.10 along with a release note with a work around for performance issues of setting derby.language.maxMemoryPerTable higher. Maybe documenting derby.language.maxMemoryPerTable would be good at the same time.
          Hide
          Kathey Marsden added a comment -

          This is the proposed code change for this issue, which is to have BLOB and CLOB match the other long types at an estimated 10,000 bytes. I still need to add tests.

          This change will require a release note as users my want to increase derby.language.maxMemoryPerTable to accomodate that this value is now being used with BLOB and CLOB.

          Show
          Kathey Marsden added a comment - This is the proposed code change for this issue, which is to have BLOB and CLOB match the other long types at an estimated 10,000 bytes. I still need to add tests. This change will require a release note as users my want to increase derby.language.maxMemoryPerTable to accomodate that this value is now being used with BLOB and CLOB.
          Hide
          Kathey Marsden added a comment -

          derby-6096_diff.txt is the full patch with tests based on Knut's repro. Tests are in progress, please review.

          Show
          Kathey Marsden added a comment - derby-6096_diff.txt is the full patch with tests based on Knut's repro. Tests are in progress, please review.
          Hide
          Kathey Marsden added a comment -

          Suites.All, derbyall, and the memory suite with -Xmx64M passed with the derby-6096_diff.txt patch except for DERBY-6138 which passed on rerun with a different classpath and is not likely related.

          Show
          Kathey Marsden added a comment - Suites.All, derbyall, and the memory suite with -Xmx64M passed with the derby-6096_diff.txt patch except for DERBY-6138 which passed on rerun with a different classpath and is not likely related.
          Hide
          Mike Matrigali added a comment -

          i reviewed the change, and seems a good change to me. some day we probably should just raise the default internal maxMemoryPerTable on a major release boundary to reflect increased default memory for most users, or maybe come up with a better zero admin auto config for it. 1 meg seems pretty small.

          Show
          Mike Matrigali added a comment - i reviewed the change, and seems a good change to me. some day we probably should just raise the default internal maxMemoryPerTable on a major release boundary to reflect increased default memory for most users, or maybe come up with a better zero admin auto config for it. 1 meg seems pretty small.
          Hide
          Kathey Marsden added a comment -

          Attached is a release note for this issue. I am not sure if it has too much information as I don't think the current default maxMemoryPerTable is published and I know there has been talk of increasing it.

          Show
          Kathey Marsden added a comment - Attached is a release note for this issue. I am not sure if it has too much information as I don't think the current default maxMemoryPerTable is published and I know there has been talk of increasing it.
          Hide
          Mamta A. Satoor added a comment -

          I reviewed the change too and it looks good to me. Just one minor comment, in the tests, should we initialize the data to be inserted into clob and blob data types?

          Show
          Mamta A. Satoor added a comment - I reviewed the change too and it looks good to me. Just one minor comment, in the tests, should we initialize the data to be inserted into clob and blob data types?
          Hide
          Kathey Marsden added a comment -

          Thanks Mamta. derby-6096_diff2.txt fills the arrays.

          Show
          Kathey Marsden added a comment - Thanks Mamta. derby-6096_diff2.txt fills the arrays.
          Hide
          Kathey Marsden added a comment -

          Sending java\engine\org\apache\derby\iapi\types\DataTypeDescriptor.java
          Sending java\testing\org\apache\derbyTesting\functionTests\tests\memory\BlobMemTest.java
          Sending java\testing\org\apache\derbyTesting\functionTests\tests\memory\ClobMemTest.java
          Transmitting file data ...
          Committed revision 1464247.

          URL: http://svn.apache.org/r1464247
          Log:
          DERBY-6096 OutOfMemoryError with Clob or Blob hash join: DataTypeDescriptor.estimatedMemoryUsage() has no case for BLOB or CLOB so would underestimate memory usage for those types at zero

          Estimate BLOB/CLOB size at 10000 like other long types.

          Show
          Kathey Marsden added a comment - Sending java\engine\org\apache\derby\iapi\types\DataTypeDescriptor.java Sending java\testing\org\apache\derbyTesting\functionTests\tests\memory\BlobMemTest.java Sending java\testing\org\apache\derbyTesting\functionTests\tests\memory\ClobMemTest.java Transmitting file data ... Committed revision 1464247. URL: http://svn.apache.org/r1464247 Log: DERBY-6096 OutOfMemoryError with Clob or Blob hash join: DataTypeDescriptor.estimatedMemoryUsage() has no case for BLOB or CLOB so would underestimate memory usage for those types at zero Estimate BLOB/CLOB size at 10000 like other long types.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching less-gc.diff which makes a small change in the tests so that they only run System.gc() if the memory statistics are actually going to be printed. (The call to gc() doesn't seem to slow down the test when it's run separately, but I suppose it could take longer if it runs as part of a larger test suite and there's more data on the heap.)

          Committed revision 1464470.

          Show
          Knut Anders Hatlen added a comment - Attaching less-gc.diff which makes a small change in the tests so that they only run System.gc() if the memory statistics are actually going to be printed. (The call to gc() doesn't seem to slow down the test when it's run separately, but I suppose it could take longer if it runs as part of a larger test suite and there's more data on the heap.) Committed revision 1464470.
          Hide
          Dag H. Wanvik added a comment -

          Can this issue be resolved?

          Show
          Dag H. Wanvik added a comment - Can this issue be resolved?
          Hide
          Kathey Marsden added a comment -

          Resolving. This issue should not be back ported as it could cause a change in performance for some applications that expect the zero estimate.

          Show
          Kathey Marsden added a comment - Resolving. This issue should not be back ported as it could cause a change in performance for some applications that expect the zero estimate.
          Hide
          Rick Hillegas added a comment -

          Correct some typos in the detailed release note.

          Show
          Rick Hillegas added a comment - Correct some typos in the detailed release note.

            People

            • Assignee:
              Kathey Marsden
              Reporter:
              Kathey Marsden
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development