Derby
  1. Derby
  2. DERBY-1511

SELECT clause without a WHERE, causes an Exception when extracting a Blob from a database

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.1.2.1
    • Fix Version/s: 10.7.1.1
    • Component/s: Store
    • Labels:
    • Environment:
      Windows XP
    • Urgency:
      Normal
    • Issue & fix info:
      High Value Fix, Repro attached, Workaround attached
    • Bug behavior facts:
      Crash

      Description

      An exception occurs when extracting a Blob from a database.

      The following code, will ALWAYS fail with the Exception:

      java.io.IOException: ERROR 40XD0: Container has been closed
      at org.apache.derby.impl.store.raw.data.OverflowInputStream.fillByteHold
      er(Unknown Source)
      at org.apache.derby.impl.store.raw.data.BufferedByteHolderInputStream.re
      ad(Unknown Source)
      at java.io.DataInputStream.read(Unknown Source)
      at java.io.FilterInputStream.read(Unknown Source)
      at java.io.ObjectInputStream$PeekInputStream.read(Unknown Source)
      at java.io.ObjectInputStream$PeekInputStream.readFully(Unknown Source)
      at java.io.ObjectInputStream$BlockDataInputStream.readDoubles(Unknown So
      urce)
      at java.io.ObjectInputStream.readArray(Unknown Source)
      at java.io.ObjectInputStream.readObject0(Unknown Source)
      at java.io.ObjectInputStream.readObject(Unknown Source)
      at BlobTest.readRows(BlobTest.java:82)
      at BlobTest.main(BlobTest.java:24)

      CODE:

      import java.io.*;
      import java.sql.*;
      import java.util.*;

      public class BlobTest
      {
      private static final String TABLE1 = "CREATE TABLE TABLE_1 ( "
      + "ID INTEGER NOT NULL, "
      + "COL_2 INTEGER NOT NULL, "
      + "PRIMARY KEY (ID) )";

      private static final String TABLE2 = "CREATE TABLE TABLE_2 ( "
      + "ID INTEGER NOT NULL, "
      + "COL_BLOB BLOB, "
      + "PRIMARY KEY (ID) )";

      public static void main(String... args) {
      try

      { createDBandTables(); Connection con = getConnection(); addRows(con, 10000, 1); addRows(con, 10000, 2); readRows(con, 1); con.close(); }

      catch(Exception exp)

      { exp.printStackTrace(); }

      }

      private static void addRows(Connection con, int size, int id)
      throws Exception

      { String sql = "INSERT INTO TABLE_1 VALUES(?, ?)"; PreparedStatement pstmt = con.prepareStatement(sql); pstmt.setInt(1, id); pstmt.setInt(2, 2); pstmt.executeUpdate(); pstmt.close(); double[] array = new double[size]; array[size-1] = 1.23; sql = "INSERT INTO TABLE_2 VALUES(?, ?)"; pstmt = con.prepareStatement(sql); pstmt.setInt(1, id); ByteArrayOutputStream byteStream = new ByteArrayOutputStream(); ObjectOutputStream objStream = new ObjectOutputStream(byteStream); objStream.writeObject(array); // Convert object to byte stream objStream.flush(); objStream.close(); byte[] bytes = byteStream.toByteArray(); ByteArrayInputStream inStream = new ByteArrayInputStream(bytes); pstmt.setBinaryStream(2, inStream, bytes.length); pstmt.executeUpdate(); pstmt.close(); }

      private static void readRows(Connection con, int id) throws Exception
      {
      String sql = "SELECT * FROM TABLE_2";
      // String sql = "SELECT * FROM TABLE_2 WHERE ID > 0";

      Statement stmt = con.createStatement();

      ResultSet rs = stmt.executeQuery(sql);

      while (rs.next())

      { rs.getInt(1); InputStream stream = rs.getBinaryStream(2); ObjectInputStream objStream = new ObjectInputStream(stream); Object obj = objStream.readObject(); double[] array = (double[]) obj; System.out.println(array.length); readTable1(con, id); }

      rs.close();
      stmt.close();
      }

      private static void readTable1(Connection con, int id) throws Exception {
      String sql = "SELECT ID FROM TABLE_1 WHERE ID=" + id;

      Statement stmt = con.createStatement();

      ResultSet rs = stmt.executeQuery(sql);

      if (rs.next()) {
      }
      rs.close();
      stmt.close();
      }

      private static Connection getConnection() throws Exception

      { String driver="org.apache.derby.jdbc.EmbeddedDriver"; Properties p = System.getProperties(); p.put("derby.system.home", "C:\\databases\\sample"); Class.forName(driver); String url = "jdbc:derby:derbyBlob"; Connection con = DriverManager.getConnection(url); return con; }

      private static void createDBandTables() throws Exception

      { String driver="org.apache.derby.jdbc.EmbeddedDriver"; Properties p = System.getProperties(); p.put("derby.system.home", "C:\\databases\\sample"); Class.forName(driver); String url = "jdbc:derby:derbyBlob;create=true"; Connection con = DriverManager.getConnection(url); Statement stmt = con.createStatement(); stmt.execute(TABLE1); stmt.execute(TABLE2); stmt.close(); con.close(); }

      }

      However if the selection clause is changed from:

      String sql = "SELECT * FROM TABLE_2";
      TO
      String sql = "SELECT * FROM TABLE_2 WHERE ID > 0";

      The code works without Exception.

      Output from: java org.apache.derby.tools.sysinfo
      ------------------ Java Information ------------------
      Java Version: 1.5.0_05
      Java Vendor: Sun Microsystems Inc.
      Java home: C:\Program Files\Java\jre1.5.0_05
      Java classpath: C:\tools\derby\db-derby-10.1.2.1-bin\lib\derby.jar;C:\tools\der
      by\db-derby-10.1.2.1-bin\lib\derbytools.jar;
      OS name: Windows XP
      OS architecture: x86
      OS version: 5.1
      Java user name: David
      Java user home: C:\Documents and Settings\David
      Java user dir: C:\david\novice\derby
      java.specification.name: Java Platform API Specification
      java.specification.version: 1.5
      --------- Derby Information --------
      JRE - JDBC: J2SE 5.0 - JDBC 3.0
      [C:\tools\derby\db-derby-10.1.2.1-bin\lib\derby.jar] 10.1.2.1 - (330608)
      [C:\tools\derby\db-derby-10.1.2.1-bin\lib\derbytools.jar] 10.1.2.1 - (330608)
      ------------------------------------------------------
      ----------------- Locale Information -----------------
      ------------------------------------------------------

      1. releaseNote.html
        5 kB
        Knut Anders Hatlen
      2. derby-1511-1d.diff
        13 kB
        Knut Anders Hatlen
      3. derby-1511-1c.diff
        13 kB
        Knut Anders Hatlen
      4. derby-1511-1c.stat
        0.8 kB
        Knut Anders Hatlen
      5. perf.png
        6 kB
        Knut Anders Hatlen
      6. derby-1511-1b.diff
        7 kB
        Knut Anders Hatlen
      7. derby-1511-1a.diff
        3 kB
        Knut Anders Hatlen
      8. test.diff
        2 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 997722.

          I've filed DERBY-4800 for the improvements needed in order to re-enable bulk scans with holdable cursors.

          Marking this issue as resolved.

          Show
          Knut Anders Hatlen added a comment - Committed revision 997722. I've filed DERBY-4800 for the improvements needed in order to re-enable bulk scans with holdable cursors. Marking this issue as resolved.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a release note.

          Show
          Knut Anders Hatlen added a comment - Attaching a release note.
          Hide
          Knut Anders Hatlen added a comment -

          Uploading a new patch 1d which fixes the NPE in the previous patch.

          The patch adds a boolean parameter to BulkTableScanResultSet which tells whether bulk fetch must be disabled for holdable cursor. At runtime, BulkTableScanResultSet will check that flag and, if requested, disable bulk fetching (by setting bulk fetch size to 1) if the cursor is holdable.

          All the regression tests ran cleanly with this patch. I've also tested some table scans with small (10 bytes) BLOBs and verified that there's no performance hit with non-holdable cursors. Holdable cursors still see a performance degradation with small LOBs in the cases where it used bulk scans before.

          Show
          Knut Anders Hatlen added a comment - Uploading a new patch 1d which fixes the NPE in the previous patch. The patch adds a boolean parameter to BulkTableScanResultSet which tells whether bulk fetch must be disabled for holdable cursor. At runtime, BulkTableScanResultSet will check that flag and, if requested, disable bulk fetching (by setting bulk fetch size to 1) if the cursor is holdable. All the regression tests ran cleanly with this patch. I've also tested some table scans with small (10 bytes) BLOBs and verified that there's no performance hit with non-holdable cursors. Holdable cursors still see a performance degradation with small LOBs in the cases where it used bulk scans before.
          Hide
          Knut Anders Hatlen added a comment -

          It looks like all the failures with the 1c patch were caused by NullPointerExceptions in FromTable.hasLargeObjectColumns(). The tests failed when the FromBaseTable was located below an IndexRowToBaseRowResultSet. The problem was that the last column of the FBT will have a CurrentRowLocationNode, which is not an SQL type and hence getType() returns null. A value that doesn't have a type cannot be a large object column, so I think this can be fixed simply by skipping the columns that don't have a type.

          Show
          Knut Anders Hatlen added a comment - It looks like all the failures with the 1c patch were caused by NullPointerExceptions in FromTable.hasLargeObjectColumns(). The tests failed when the FromBaseTable was located below an IndexRowToBaseRowResultSet. The problem was that the last column of the FBT will have a CurrentRowLocationNode, which is not an SQL type and hence getType() returns null. A value that doesn't have a type cannot be a large object column, so I think this can be fixed simply by skipping the columns that don't have a type.
          Hide
          Knut Anders Hatlen added a comment -

          Many of the tests failed with the 1c patch. I'll investigate and come up with a new patch.

          Show
          Knut Anders Hatlen added a comment - Many of the tests failed with the 1c patch. I'll investigate and come up with a new patch.
          Hide
          Knut Anders Hatlen added a comment -

          Here's an updated patch that only disables bulk fetch if the table contains LOB columns and the cursor is holdable. I've only run BLOBTest on it yet. Will start the full regression test suite.

          Show
          Knut Anders Hatlen added a comment - Here's an updated patch that only disables bulk fetch if the table contains LOB columns and the cursor is holdable. I've only run BLOBTest on it yet. Will start the full regression test suite.
          Hide
          Mike Matrigali added a comment -

          +1 to fixing the current bug as described (if you can do it only for holdable case that would be best),
          as it best to get a fix in that makes queries run, even at cost of some performance loss.
          But do log a linked improvement JIRA to fix this issue in some way that does not require disabling
          bulk fetch. The blob holders should be improved to withstand the closing of the underlying containers,
          but this is best approached as a feature level task in a new release.

          Show
          Mike Matrigali added a comment - +1 to fixing the current bug as described (if you can do it only for holdable case that would be best), as it best to get a fix in that makes queries run, even at cost of some performance loss. But do log a linked improvement JIRA to fix this issue in some way that does not require disabling bulk fetch. The blob holders should be improved to withstand the closing of the underlying containers, but this is best approached as a feature level task in a new release.
          Hide
          Knut Anders Hatlen added a comment -

          There is of course also the option of trying to make the pre-fetched LOBs survive the commit, which wouldn't affect anyone at all, but I suspect that's somewhat more involved, so I haven't looked into that option. But if we just get a fix in for the bug first, with a documented workaround for those who see an unacceptable performance hit, we could always revisit the issue to improve the performance for the holdable case later.

          Show
          Knut Anders Hatlen added a comment - There is of course also the option of trying to make the pre-fetched LOBs survive the commit, which wouldn't affect anyone at all, but I suspect that's somewhat more involved, so I haven't looked into that option. But if we just get a fix in for the bug first, with a documented workaround for those who see an unacceptable performance hit, we could always revisit the issue to improve the performance for the holdable case later.
          Hide
          Kristian Waagan added a comment -

          Thanks for obtaining hard facts, Knut

          I'm +0.9 for the current approach, since I agree most users shouldn't be affected, and +1 for the even better approach you suggested in your last comment (assuming the decision can be taken easily at runtime).

          Show
          Kristian Waagan added a comment - Thanks for obtaining hard facts, Knut I'm +0.9 for the current approach, since I agree most users shouldn't be affected, and +1 for the even better approach you suggested in your last comment (assuming the decision can be taken easily at runtime).
          Hide
          Knut Anders Hatlen added a comment -

          I ran some tests which performed table scans on a table with 100MB of BLOB data. The attached graph shows the results for various record sizes. In short, disabling bulk fetch seems to have a large impact on small records (~75% reduction with record size 100 bytes), whereas there's little difference for records > ~30K. In fact, for large records, disabling bulk fetch seems to improve the performance marginally.

          Reduced table scan performance for small LOBs would probably be acceptable because (a) full table scan should be avoided in the first place if performance is important, and (b) for that small records VARCHAR and VARCHAR FOR BIT DATA should be preferred to CLOB and BLOB. But since this fix is only needed for holdable cursors, we could delay the disabling of bulk fetching to runtime so that we can check the holdability first, and only disable it if the cursor is holdable. Then the users would be given an option (use a non-holdable cursor) that didn't require changes to their queries or database schema if some of their queries slow down. We could document this in a release note. How does that sound?

          Show
          Knut Anders Hatlen added a comment - I ran some tests which performed table scans on a table with 100MB of BLOB data. The attached graph shows the results for various record sizes. In short, disabling bulk fetch seems to have a large impact on small records (~75% reduction with record size 100 bytes), whereas there's little difference for records > ~30K. In fact, for large records, disabling bulk fetch seems to improve the performance marginally. Reduced table scan performance for small LOBs would probably be acceptable because (a) full table scan should be avoided in the first place if performance is important, and (b) for that small records VARCHAR and VARCHAR FOR BIT DATA should be preferred to CLOB and BLOB. But since this fix is only needed for holdable cursors, we could delay the disabling of bulk fetching to runtime so that we can check the holdability first, and only disable it if the cursor is holdable. Then the users would be given an option (use a non-holdable cursor) that didn't require changes to their queries or database schema if some of their queries slow down. We could document this in a release note. How does that sound?
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for looking at the patch, Kristian. I'll see if I can come up with some numbers that describe the performance impact. Speaking of performance, disabling bulk-fetch is probably only necessary when using holdable cursors (the default), since non-holdable cursors won't even allow you to execute next() after a commit. But, as far as I can tell, the compiled plans can be shared between statements that produce holdable cursors and statements that produce non-holdable cursors, so I don't think it's possible to predict the holdability of the cursor at compile time.

          The previous patch made two test cases in BlobClob4BlobTest fail: testNegativeTestDerby265Blob and testNegativeTestDerby265Clob. In both cases the tests failed because they expected a "container closed" exception in queries similar to the one reported here. I'm uploading a new patch (1b) that makes BlobClob4BlobTest expect these test cases to execute cleanly.

          Show
          Knut Anders Hatlen added a comment - Thanks for looking at the patch, Kristian. I'll see if I can come up with some numbers that describe the performance impact. Speaking of performance, disabling bulk-fetch is probably only necessary when using holdable cursors (the default), since non-holdable cursors won't even allow you to execute next() after a commit. But, as far as I can tell, the compiled plans can be shared between statements that produce holdable cursors and statements that produce non-holdable cursors, so I don't think it's possible to predict the holdability of the cursor at compile time. The previous patch made two test cases in BlobClob4BlobTest fail: testNegativeTestDerby265Blob and testNegativeTestDerby265Clob. In both cases the tests failed because they expected a "container closed" exception in queries similar to the one reported here. I'm uploading a new patch (1b) that makes BlobClob4BlobTest expect these test cases to execute cleanly.
          Hide
          Kristian Waagan added a comment -

          The attached patch looks good to me. It will be interesting to how/if performance is affected by this (I think I'm aware of an internal test that would reveal this).

          Just as a side note, I guess another option would be to hide locators that aren't ready for deallocation. This does however sound like an error-prone approach, adding further complexity in this area.

          Given that the tests run without failure, and that the performance for selecting LOBs doesn't degrade too much, +1 to commit.
          I think a small performance degradation is to prefer over a non-working query, especially since bulk fetch is disabled for other types of queries anyway.

          Show
          Kristian Waagan added a comment - The attached patch looks good to me. It will be interesting to how/if performance is affected by this (I think I'm aware of an internal test that would reveal this). Just as a side note, I guess another option would be to hide locators that aren't ready for deallocation. This does however sound like an error-prone approach, adding further complexity in this area. Given that the tests run without failure, and that the performance for selecting LOBs doesn't degrade too much, +1 to commit. I think a small performance degradation is to prefer over a non-working query, especially since bulk fetch is disabled for other types of queries anyway.
          Hide
          Knut Anders Hatlen added a comment -

          I'm attaching a patch that disables bulk fetch if the table scan returns BLOBs or CLOBs. This makes the repro run cleanly. I haven't run any other tests yet.

          Show
          Knut Anders Hatlen added a comment - I'm attaching a patch that disables bulk fetch if the table scan returns BLOBs or CLOBs. This makes the repro run cleanly. I haven't run any other tests yet.
          Hide
          Knut Anders Hatlen added a comment -

          I'm attaching a JUnit regression test case that exposes this bug.

          Show
          Knut Anders Hatlen added a comment - I'm attaching a JUnit regression test case that exposes this bug.
          Hide
          Dag H. Wanvik added a comment -

          Thanks for documenting this, Tomislav. Hopefully, your information can help us make a repro of this issue.

          Show
          Dag H. Wanvik added a comment - Thanks for documenting this, Tomislav. Hopefully, your information can help us make a repro of this issue.
          Hide
          Tomislav Nakic-Alfirevic added a comment -

          I'm not sure if it's the same issue but I've consistently running into "ERROR 40XD0: Container has been closed "

          I can't provide SQL code to reproduce the problem because I wrote the code in groovy and it generates the SQL, but I'm basically running something along these lines:

          for each row in (select * from tableA)
          ...
          select count(id) > 0 from tableB where name = 'somename'
          insert into tableB (...) values...
          insert into tableC (...) values...
          end

          tableB has ~140k rows with a varchar(32000) field which is sometimes completely full.
          The loop would always break on row with id X, but it's not the row that's the problem. if I replace (select * from tableA) with (select * from tableA where id = X), it works just fine. It seems that derby only breaks after a certain number of iterations. I wanted to get a better feeling of what the problem was and I believe there were no errors when I commented out the insert statements.

          Now, I have tried embedded mode and network client mode as well as different derby versions (10.3 and 10.5.3), but I have seen no difference in behaviour. I've also attempted to set autocommit to false and explicitly commit after inserts (someone suggested it as a workaround somewhere) without result.
          I've spent a day and a half debugging this and finally migrated the data to PostgreSQL where it now works flawlessly.

          This is the definition of tableA:

          create table article (
          id int,
          id_article_type int,
          summary varchar(1024),
          content varchar(32672), – need a lot of space
          )

          That's about it...I can't re-run the application any more without a significant effort so I can't provide the exact stack trace, but hope this is detailed enough.

          Show
          Tomislav Nakic-Alfirevic added a comment - I'm not sure if it's the same issue but I've consistently running into "ERROR 40XD0: Container has been closed " I can't provide SQL code to reproduce the problem because I wrote the code in groovy and it generates the SQL, but I'm basically running something along these lines: for each row in (select * from tableA) ... select count(id) > 0 from tableB where name = 'somename' insert into tableB (...) values... insert into tableC (...) values... end tableB has ~140k rows with a varchar(32000) field which is sometimes completely full. The loop would always break on row with id X, but it's not the row that's the problem. if I replace (select * from tableA) with (select * from tableA where id = X), it works just fine. It seems that derby only breaks after a certain number of iterations. I wanted to get a better feeling of what the problem was and I believe there were no errors when I commented out the insert statements. Now, I have tried embedded mode and network client mode as well as different derby versions (10.3 and 10.5.3), but I have seen no difference in behaviour. I've also attempted to set autocommit to false and explicitly commit after inserts (someone suggested it as a workaround somewhere) without result. I've spent a day and a half debugging this and finally migrated the data to PostgreSQL where it now works flawlessly. This is the definition of tableA: create table article ( id int, id_article_type int, summary varchar(1024), content varchar(32672), – need a lot of space ) That's about it...I can't re-run the application any more without a significant effort so I can't provide the exact stack trace, but hope this is detailed enough.
          Hide
          Knut Anders Hatlen added a comment -

          Would it make sense to force the use of TableScanResultSet instead of BulkTableScanResultSet when the result contains long columns? I ran the test case successfully with this diff applied (BulkTableScanResultSet with rowsPerRead==1 means that it behaves exactly like TableScanResultSet):

          — BulkTableScanResultSet.java (revision 593240)
          +++ BulkTableScanResultSet.java (working copy)
          @@ -128,7 +128,7 @@
          lockMode,
          tableLocked,
          isolationLevel,

          • rowsPerRead,
            + /rowsPerRead/ 1,
            oneRowScan,
            optimizerEstimatedRowCount,
            optimizerEstimatedCost);
          Show
          Knut Anders Hatlen added a comment - Would it make sense to force the use of TableScanResultSet instead of BulkTableScanResultSet when the result contains long columns? I ran the test case successfully with this diff applied (BulkTableScanResultSet with rowsPerRead==1 means that it behaves exactly like TableScanResultSet): — BulkTableScanResultSet.java (revision 593240) +++ BulkTableScanResultSet.java (working copy) @@ -128,7 +128,7 @@ lockMode, tableLocked, isolationLevel, rowsPerRead, + / rowsPerRead / 1, oneRowScan, optimizerEstimatedRowCount, optimizerEstimatedCost);
          Hide
          Andreas Korneliussen added a comment -

          Yes.

          If you modify the code, and replace readTable1(..) with con.commit(), you will see this issue with autocommit off:

          The problem seems to be related to BulkTableScanResultSet. On the first call to next() it builds an array of rows (rowArray), and feeds the result set with rows from that array. However, after a commit, the Conglomerate is closed. The BulkTableScanResultSet will continue to feed rows from the rowArray, and the BLOB columns in these rows depend on reading data from the Conglomerate (which is closed).

          When adding a WHERE clause, the engine will not use BulkTableScanResultSet. Also if the statement has concurrency CONCUR_UPDATABLE, the engine will not use BulkTableScanResultSet.

          Show
          Andreas Korneliussen added a comment - Yes. If you modify the code, and replace readTable1(..) with con.commit(), you will see this issue with autocommit off: The problem seems to be related to BulkTableScanResultSet. On the first call to next() it builds an array of rows (rowArray), and feeds the result set with rows from that array. However, after a commit, the Conglomerate is closed. The BulkTableScanResultSet will continue to feed rows from the rowArray, and the BLOB columns in these rows depend on reading data from the Conglomerate (which is closed). When adding a WHERE clause, the engine will not use BulkTableScanResultSet. Also if the statement has concurrency CONCUR_UPDATABLE, the engine will not use BulkTableScanResultSet.
          Hide
          Kathey Marsden added a comment -

          Do you see this issue with autocommit off?

          Show
          Kathey Marsden added a comment - Do you see this issue with autocommit off?

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              David Heath
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development