OpenJPA
  1. OpenJPA
  2. OPENJPA-1248

LOB streaming does not work as expected

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0, 1.2.0, 1.2.1, 1.2.2, 1.3.0
    • Fix Version/s: 1.3.0, 2.0.0-beta3
    • Component/s: jdbc
    • Labels:
      None
    • Environment:
      Oracle, MySQL, PostgreSQL, SQL Server

      Description

      TestReaderLob does not pass with any of the databases indicated in the test case: Oracle, MySQL, PostgreSQL, SQL Server. TestInputStreamLob passes only with MySQL and PostgreSQL.

      1. TestReaderLob_oracle.log
        22 kB
        Amy Yang
      2. failures.txt
        23 kB
        Milosz Tylenda

        Issue Links

          Activity

          Hide
          Milosz Tylenda added a comment -

          Sample stack traces attached.

          Show
          Milosz Tylenda added a comment - Sample stack traces attached.
          Hide
          Amy Yang added a comment -

          similar error on oracle

          Show
          Amy Yang added a comment - similar error on oracle
          Hide
          Milosz Tylenda added a comment -

          Some Oracle-specific antidote for this problem has been committed in OPENJPA-1249.

          Show
          Milosz Tylenda added a comment - Some Oracle-specific antidote for this problem has been committed in OPENJPA-1249 .
          Hide
          Milosz Tylenda added a comment -

          Removed dead OracleDictionary.insertBlobForStreamingLoad(Row row, Column col, Object ob).

          Show
          Milosz Tylenda added a comment - Removed dead OracleDictionary.insertBlobForStreamingLoad(Row row, Column col, Object ob).
          Hide
          Jeremy Bauer added a comment -

          These tests are two of the few remaining failures in the o-j-p project when running on Oracle. (Version XE/10 using the 11.2.0.1 driver) I found a few issues and came up with a couple of options for fixing them.

          First, when inserting a LOB OpenJPA currently inserts a null value into the LOB column. When this value is retrieved from the result set (res.getBlob) it is null - and not an updatable blob. This is currently causing an NPE. If an Oracle empty_[b,c]lob() is inserted, the rs will return an updatable lob. However, this changes behavior a bit since null != empty and some of the test assertions that assert a persistent stream attribute is NULL will fail. Another option is to continue to insert a NULL value then when doing the update, create a temporary lob (using Oracle APIs) and do the update. I've found this to be troublesome - especially when connection pooling is used - since the Oracle APIs assert that the connection they are provided is an Oracle connection. I put together a DBCP/JDBC 4 reflection-based solution but it is cumbersome. Worse yet, DBCP requires a special property be set to get access to the underlying connection.

          I think the cleanest solution would be to use empty_lob() and change the behavior of the tests to expect an empty stream when using Oracle. A hybrid solution would handle cases where an existing value is null... but this might be overkill.

          The second issue (after correcting the first) was that the test was failing with an exception indicating that the connection is closed when reading from the updated BLOB. This issue is related to OpenJPA connection management and the need for Oracle streaming LOBs to maintain an open connection. The issue was corrected on Oracle by setting the property openjpa.ConnectionRetainMode=transaction, indicating that em should use the same connection for the duration of the transaction. Some applications may need to use mode "always", depending on transactional patterns and when the LOB is accessed - but that can result in an inefficient use of connections in a pooled environment. This needs to be documented in the OpenJPA manual as part of this JIRA. Unfortunately, this did not correct similar test failures on SQL Server. SQL Server will require additional investigation.

          Unless someone(s) is/are in strong disagreement, I'll be committing code to trunk which fixes the Oracle failures by using empty_[c,b]lob() instead of inserting null. The tests will be modified to assert that the stream is empty (instead of null) when running with Oracle. I'll also attach a patch to OPENJPA-1249 (since the patch is Oracle specific) that uses native Oracle methods to handle null LOBs as an alternative solution. Finally, the Oracle section of the manual will be updated to include the ConnectionRetainMode requirement for streaming LOBs.

          Show
          Jeremy Bauer added a comment - These tests are two of the few remaining failures in the o-j-p project when running on Oracle. (Version XE/10 using the 11.2.0.1 driver) I found a few issues and came up with a couple of options for fixing them. First, when inserting a LOB OpenJPA currently inserts a null value into the LOB column. When this value is retrieved from the result set (res.getBlob) it is null - and not an updatable blob. This is currently causing an NPE. If an Oracle empty_ [b,c] lob() is inserted, the rs will return an updatable lob. However, this changes behavior a bit since null != empty and some of the test assertions that assert a persistent stream attribute is NULL will fail. Another option is to continue to insert a NULL value then when doing the update, create a temporary lob (using Oracle APIs) and do the update. I've found this to be troublesome - especially when connection pooling is used - since the Oracle APIs assert that the connection they are provided is an Oracle connection. I put together a DBCP/JDBC 4 reflection-based solution but it is cumbersome. Worse yet, DBCP requires a special property be set to get access to the underlying connection. I think the cleanest solution would be to use empty_lob() and change the behavior of the tests to expect an empty stream when using Oracle. A hybrid solution would handle cases where an existing value is null... but this might be overkill. The second issue (after correcting the first) was that the test was failing with an exception indicating that the connection is closed when reading from the updated BLOB. This issue is related to OpenJPA connection management and the need for Oracle streaming LOBs to maintain an open connection. The issue was corrected on Oracle by setting the property openjpa.ConnectionRetainMode=transaction, indicating that em should use the same connection for the duration of the transaction. Some applications may need to use mode "always", depending on transactional patterns and when the LOB is accessed - but that can result in an inefficient use of connections in a pooled environment. This needs to be documented in the OpenJPA manual as part of this JIRA. Unfortunately, this did not correct similar test failures on SQL Server. SQL Server will require additional investigation. Unless someone(s) is/are in strong disagreement, I'll be committing code to trunk which fixes the Oracle failures by using empty_ [c,b] lob() instead of inserting null. The tests will be modified to assert that the stream is empty (instead of null) when running with Oracle. I'll also attach a patch to OPENJPA-1249 (since the patch is Oracle specific) that uses native Oracle methods to handle null LOBs as an alternative solution. Finally, the Oracle section of the manual will be updated to include the ConnectionRetainMode requirement for streaming LOBs.
          Hide
          Milosz Tylenda added a comment -

          Hi Jeremy! I have a thought for the first Oracle issue. Have you considered a solution similar to the one we use when handling non-streaming CLOBs with Oracle? I mean OracleDictionary.getClobString method: we use isEmptyLob() from Oracle API and return null if the CLOB is actually empty. Is this what you are referring to as a potential overkill?

          Show
          Milosz Tylenda added a comment - Hi Jeremy! I have a thought for the first Oracle issue. Have you considered a solution similar to the one we use when handling non-streaming CLOBs with Oracle? I mean OracleDictionary.getClobString method: we use isEmptyLob() from Oracle API and return null if the CLOB is actually empty. Is this what you are referring to as a potential overkill?
          Hide
          Jeremy Bauer added a comment -

          Hi Milosz. Thanks for your comments. Excellent suggestion. It sparked a new idea that does not require messy reflection code or changing external behavior. I'm going to give it a try and will post a patch if it works out. Stay tuned...

          Show
          Jeremy Bauer added a comment - Hi Milosz. Thanks for your comments. Excellent suggestion. It sparked a new idea that does not require messy reflection code or changing external behavior. I'm going to give it a try and will post a patch if it works out. Stay tuned...
          Hide
          Jeremy Bauer added a comment -

          I committed a partial fix for Oracle under revision 834564. With the exception of adding the ConnectionRetainMode setting to the test config, the tests pass as-is (ie. expected behavior is unchanged). I also doc'd the ConnectionRetainMode requirement in the Oracle section of the manual.

          One potential issue may still remain. If a LOB column contains a null (ex. inserted by an external application) and OpenJPA tries to update that value a failure may still result. The cumbersome patch I submitted for OpenJPA-1249 will handle that case, but I have a more direct approach in mind that uses SQL instead of native APIs.

          Show
          Jeremy Bauer added a comment - I committed a partial fix for Oracle under revision 834564. With the exception of adding the ConnectionRetainMode setting to the test config, the tests pass as-is (ie. expected behavior is unchanged). I also doc'd the ConnectionRetainMode requirement in the Oracle section of the manual. One potential issue may still remain. If a LOB column contains a null (ex. inserted by an external application) and OpenJPA tries to update that value a failure may still result. The cumbersome patch I submitted for OpenJPA-1249 will handle that case, but I have a more direct approach in mind that uses SQL instead of native APIs.
          Hide
          Milosz Tylenda added a comment -

          Well done, a small aikido-like fix in OracleDictionary

          I am currently trying to fix the PostgreSQL issues.

          Show
          Milosz Tylenda added a comment - Well done, a small aikido-like fix in OracleDictionary I am currently trying to fix the PostgreSQL issues.
          Hide
          Milosz Tylenda added a comment -

          Using Reader with PostgreSQL is questionable because their Large Object API supports binary streams only. I was thinking of emulating Reader but the amount of changes made me be in favour of just documenting that only InputStream can be used with PostgreSQL. The Reader support could be added if demand surfaces.

          Another PostgreSQL quirk is similar to Oracle API problems. We cast Connection to PGConnection which fails if the Connection is wrapped, for example, by DBCP. Currently the TestInputStreamLob passes when run without pooling (e.g. from an IDE) but not with the test suite (which uses DBCP).

          Show
          Milosz Tylenda added a comment - Using Reader with PostgreSQL is questionable because their Large Object API supports binary streams only. I was thinking of emulating Reader but the amount of changes made me be in favour of just documenting that only InputStream can be used with PostgreSQL. The Reader support could be added if demand surfaces. Another PostgreSQL quirk is similar to Oracle API problems. We cast Connection to PGConnection which fails if the Connection is wrapped, for example, by DBCP. Currently the TestInputStreamLob passes when run without pooling (e.g. from an IDE) but not with the test suite (which uses DBCP).
          Hide
          Jeremy Bauer added a comment -

          Hi Milosz. I agree. It my not be worth the risk and/or effort if a huge number of changes are required to support Reader, especially given there hasn't been a request to support it on PostgreSQL. It might be a fun exercise, though.

          Take a look at my patch for OPENJPA-1249. The bit of code that gets an underlying DBCP connection could be split out into a utility class and put to use for the PostgreSQL streaming support. The patch also includes a pom update with the connection property setting required to access the underlying DBCP connection.

          Show
          Jeremy Bauer added a comment - Hi Milosz. I agree. It my not be worth the risk and/or effort if a huge number of changes are required to support Reader, especially given there hasn't been a request to support it on PostgreSQL. It might be a fun exercise, though. Take a look at my patch for OPENJPA-1249 . The bit of code that gets an underlying DBCP connection could be split out into a utility class and put to use for the PostgreSQL streaming support. The patch also includes a pom update with the connection property setting required to access the underlying DBCP connection.
          Hide
          Milosz Tylenda added a comment -

          I have updated PostgresDictionary with Jeremy's ideas on connection unwrapping. Now the test passes with DBCP and PostgreSQL.

          I will also update the user manual.

          Show
          Milosz Tylenda added a comment - I have updated PostgresDictionary with Jeremy's ideas on connection unwrapping. Now the test passes with DBCP and PostgreSQL. I will also update the user manual.
          Hide
          Milosz Tylenda added a comment -

          The problem with SQL Server is that an InputStream obtained from ResultSet.getBinaryStream can't be used after its result set is closed. OpenJPA closes the ResultSet before user has a chance to read from the InputStream.

          I am investigating these ideas:
          1. Using ResultSet.getBlob.getBinaryStream instead of ResultSet.getBinaryStream.
          2. Proxying the ResultSet to defer its closing.

          Show
          Milosz Tylenda added a comment - The problem with SQL Server is that an InputStream obtained from ResultSet.getBinaryStream can't be used after its result set is closed. OpenJPA closes the ResultSet before user has a chance to read from the InputStream. I am investigating these ideas: 1. Using ResultSet.getBlob.getBinaryStream instead of ResultSet.getBinaryStream. 2. Proxying the ResultSet to defer its closing.
          Hide
          Milosz Tylenda added a comment -

          Looking closer at MySQL reveals its limitations. When reading LOB contents, the whole contents are actually brought into memory. Chances are this can be worked around in:
          1. MySQLDictionary.getLOBStream / getCharacterStream.
          2. Modifying column list in SELECT.

          Details can be found in MySQL docs [1]. For now I have added a note on the limitation to the user manual.

          Not sure whehter a similar limitation exists when writing LOB contents to the database.

          [1] http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-implementation-notes.html

          Show
          Milosz Tylenda added a comment - Looking closer at MySQL reveals its limitations. When reading LOB contents, the whole contents are actually brought into memory. Chances are this can be worked around in: 1. MySQLDictionary.getLOBStream / getCharacterStream. 2. Modifying column list in SELECT. Details can be found in MySQL docs [1] . For now I have added a note on the limitation to the user manual. Not sure whehter a similar limitation exists when writing LOB contents to the database. [1] http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-implementation-notes.html
          Hide
          Milosz Tylenda added a comment -

          Modified SQLServerDictionary to use ResultSet.getBlob.getBinaryStream (and its CLOB counterpart). Unfortunately, this poses a problem similar to the one with MySQL - when reading LOB contents, the whole contents are actually brought into memory. Added a note on the limitation to the user manual.

          ResultSet.getBinaryStream provides true streaming but suffers from the problem described earlier.

          I did not check whether streaming actually takes place when writing LOB contents to the database. Also, I tried only MS 2.0 driver.

          Now the LOB streaming tests pass with all 4 databases.

          Show
          Milosz Tylenda added a comment - Modified SQLServerDictionary to use ResultSet.getBlob.getBinaryStream (and its CLOB counterpart). Unfortunately, this poses a problem similar to the one with MySQL - when reading LOB contents, the whole contents are actually brought into memory. Added a note on the limitation to the user manual. ResultSet.getBinaryStream provides true streaming but suffers from the problem described earlier. I did not check whether streaming actually takes place when writing LOB contents to the database. Also, I tried only MS 2.0 driver. Now the LOB streaming tests pass with all 4 databases.
          Hide
          Milosz Tylenda added a comment -

          Resolving since the tests now pass. Further work can be continued in a new issue for post 2.0 versions.

          Show
          Milosz Tylenda added a comment - Resolving since the tests now pass. Further work can be continued in a new issue for post 2.0 versions.
          Hide
          Milosz Tylenda added a comment -

          The change has been back-ported to branch 1.3.x.

          Show
          Milosz Tylenda added a comment - The change has been back-ported to branch 1.3.x.

            People

            • Assignee:
              Milosz Tylenda
              Reporter:
              Milosz Tylenda
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development