Derby
  1. Derby
  2. DERBY-5505

BLOB and CLOB Reference Manual topics could use some fixes

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.8.2.2
    • Fix Version/s: 10.9.1.0
    • Component/s: Documentation
    • Labels:
      None

      Description

      The Reference Manual documentation on BLOBs and CLOBs has some gaps that I would like to fill as a result of my efforts to insert a BLOB into a table column. This should affect the following topics:

      CAST function (rrefsqlj33562.dita)
      CLOB data type (rrefclob.dita)
      BLOB data type (rrefblob.dita)

      A comment will provide further details.

      1. seeds.jpg
        5 kB
        Kim Haase
      2. greentea.jpg
        3 kB
        Kim Haase
      3. fence.jpg
        3 kB
        Kim Haase
      4. DERBY-5505-2.zip
        9 kB
        Kim Haase
      5. DERBY-5505-2.diff
        7 kB
        Kim Haase
      6. DERBY-5505.zip
        10 kB
        Kim Haase
      7. DERBY-5505.stat
        0.1 kB
        Kim Haase
      8. DERBY-5505.diff
        8 kB
        Kim Haase
      9. create.sql
        18 kB
        Kim Haase
      10. clobex.java
        2 kB
        Kim Haase
      11. clobex.java
        2 kB
        Dag H. Wanvik
      12. blobex.java
        2 kB
        Kim Haase
      13. blobex.java
        2 kB
        Dag H. Wanvik

        Activity

        Hide
        Kim Haase added a comment -

        An INSERT statement that inserts a bit string (the contents of an image file) into a column defined as BLOB drew a stacktrace when the data was

        x'ffd8ffe000.....'

        It appeared to work correctly when the data was

        cast(x'ffd8ffe000...' as blob)

        But the image was truncated in the database. Word of mouth informs me that SQL cannot handle bit strings longer than 32K, so maybe that's the reason (though the SQL script being used is only 18543 bytes in size and the image files themselves are between 2.5K and 5K). The topic on the CAST function has a table that indicates that CHAR FOR BIT DATA and so on can be cast to BLOB; but under "Conversions from and to bit strings" it says, "In most cases the BLOB type cannot be cast to and from other types." This is a bit vague. I cannot find the 32K limitation stated anywhere, so maybe it should be added? Are there other specific limitations we could put here?

        The topic on the CLOB data type has an example Java program, but it does not work correctly as shown. It incurs the following runtime error:

        Error! java.sql.SQLException: Stream or LOB value cannot be retrieved more than once

        This happens because of the following two statements:

        java.sql.Clob aclob = rs.getClob(1);
        java.io.InputStream ip = rs.getAsciiStream(1);

        I can get rid of the error either by commenting out the first statement or by rewriting the second one as follows:

        java.io.InputStream ip = aclob.getAsciiStream();

        Also, should there be an "ip.close()" statement at the end of the outer while loop? Not to mention a "conn.close()" and database close at the end of the program? Also, is there any point in calling "conn.commit()" when the default auto-commit mode is on?

        The topic on the BLOB data type has examples of CREATE and SELECT statements, but no example INSERT statement. It might be useful to show a sample CAST statement, mentioning the 32K limitation. It would possibly also be helpful to provide a Java program similar to the one for the CLOB data type.

        I am attaching a rewritten version of the CLOB program and a similar BLOB one that writes the blob to a new version of the image file that was stored. Suggestions for improvement are welcome.

        Show
        Kim Haase added a comment - An INSERT statement that inserts a bit string (the contents of an image file) into a column defined as BLOB drew a stacktrace when the data was x'ffd8ffe000.....' It appeared to work correctly when the data was cast(x'ffd8ffe000...' as blob) But the image was truncated in the database. Word of mouth informs me that SQL cannot handle bit strings longer than 32K, so maybe that's the reason (though the SQL script being used is only 18543 bytes in size and the image files themselves are between 2.5K and 5K). The topic on the CAST function has a table that indicates that CHAR FOR BIT DATA and so on can be cast to BLOB; but under "Conversions from and to bit strings" it says, "In most cases the BLOB type cannot be cast to and from other types." This is a bit vague. I cannot find the 32K limitation stated anywhere, so maybe it should be added? Are there other specific limitations we could put here? The topic on the CLOB data type has an example Java program, but it does not work correctly as shown. It incurs the following runtime error: Error! java.sql.SQLException: Stream or LOB value cannot be retrieved more than once This happens because of the following two statements: java.sql.Clob aclob = rs.getClob(1); java.io.InputStream ip = rs.getAsciiStream(1); I can get rid of the error either by commenting out the first statement or by rewriting the second one as follows: java.io.InputStream ip = aclob.getAsciiStream(); Also, should there be an "ip.close()" statement at the end of the outer while loop? Not to mention a "conn.close()" and database close at the end of the program? Also, is there any point in calling "conn.commit()" when the default auto-commit mode is on? The topic on the BLOB data type has examples of CREATE and SELECT statements, but no example INSERT statement. It might be useful to show a sample CAST statement, mentioning the 32K limitation. It would possibly also be helpful to provide a Java program similar to the one for the CLOB data type. I am attaching a rewritten version of the CLOB program and a similar BLOB one that writes the blob to a new version of the image file that was stored. Suggestions for improvement are welcome.
        Hide
        Kim Haase added a comment -

        Attaching modified version of CLOB example, and a BLOB version.

        Show
        Kim Haase added a comment - Attaching modified version of CLOB example, and a BLOB version.
        Hide
        Dag H. Wanvik added a comment -

        Thanks for improving our docs in this area, Kim!

        Kim> I can get rid of the error either by commenting out the first statement or by rewriting the second one as follows:

        This is because we now enforce the JDBC recommendation that result set columns be read only once (and in order, not sure if we enforce that, I believe not) for maximum portability. It turned out to be hard to allow multiple reads of a lob result set column in the general case, so I think Kristian implemented this new restriction. I think we do no enforce it for non-lob columns, though.

        I'll have a look at your new sample programs.

        Show
        Dag H. Wanvik added a comment - Thanks for improving our docs in this area, Kim! Kim> I can get rid of the error either by commenting out the first statement or by rewriting the second one as follows: This is because we now enforce the JDBC recommendation that result set columns be read only once (and in order, not sure if we enforce that, I believe not) for maximum portability. It turned out to be hard to allow multiple reads of a lob result set column in the general case, so I think Kristian implemented this new restriction. I think we do no enforce it for non-lob columns, though. I'll have a look at your new sample programs.
        Hide
        Dag H. Wanvik added a comment - - edited

        Hi Kim! Looks good! Uploading my slightly modified version of the examples.

        In the Clob example I changed the test while reading to be "!= -1" instead of "> 0" (0 is not EOF actually, so ">= 0" would be correct, I chose to test on -1 since the Javadoc says -1 signal EOF).

        In the Blob case I changed the reading out of the Blob to use a stream rather then getBytes (limit memory usage with large data). I also removed the commit statements and the comment about the autoloading.

        I also changed the fileLength from an int to a long and added comment that the length should be checked against the 64 L max in the column's datatype.
        The final catch with the stack print could be dropped, I guess, just add throws Exception to main.. The additional closes you suggested are good.

        Feel free to use what you think is appropriate for your example! Also, with Java 7 we could upgrade the examples to the use new try-with-resource to avoid having to close the connection, but for now we may not want to use that in the examples, most people are still not on Java 7 I guess.

        Show
        Dag H. Wanvik added a comment - - edited Hi Kim! Looks good! Uploading my slightly modified version of the examples. In the Clob example I changed the test while reading to be "!= -1" instead of "> 0" (0 is not EOF actually, so ">= 0" would be correct, I chose to test on -1 since the Javadoc says -1 signal EOF). In the Blob case I changed the reading out of the Blob to use a stream rather then getBytes (limit memory usage with large data). I also removed the commit statements and the comment about the autoloading. I also changed the fileLength from an int to a long and added comment that the length should be checked against the 64 L max in the column's datatype. The final catch with the stack print could be dropped, I guess, just add throws Exception to main.. The additional closes you suggested are good. Feel free to use what you think is appropriate for your example! Also, with Java 7 we could upgrade the examples to the use new try-with-resource to avoid having to close the connection, but for now we may not want to use that in the examples, most people are still not on Java 7 I guess.
        Hide
        Kim Haase added a comment -

        Thanks very much, Dag! I was perplexed by the input and output streams and really appreciate the help. I will definitely drop the printStackTrace call.

        Do you think there is more to say about why/when casting to/from a BLOB is inadvisable, or should we just leave it at that?

        Show
        Kim Haase added a comment - Thanks very much, Dag! I was perplexed by the input and output streams and really appreciate the help. I will definitely drop the printStackTrace call. Do you think there is more to say about why/when casting to/from a BLOB is inadvisable, or should we just leave it at that?
        Hide
        Dag H. Wanvik added a comment -

        I guess we should also document the CAST possibilities, I am not sure of the restrictions here off the top of my head, Kristian is our local lob maven.. Does any else know how?

        Show
        Dag H. Wanvik added a comment - I guess we should also document the CAST possibilities, I am not sure of the restrictions here off the top of my head, Kristian is our local lob maven.. Does any else know how?
        Hide
        Rick Hillegas added a comment -

        Hi Kim,

        > "In most cases the BLOB type cannot be cast to and from other types." This is a bit vague.

        I think that this is just trying to summarize what's documented in Table 1. A BLOB can only be cast to another BLOB but it can be cast from other binary types. Other than that, no casting to/from BLOBs is allowed. I think that the SQL Standard allows arbitrary casting between binary types...but Derby does not support casting BLOB to [VAR]CHAR FOR BIT DATA.

        Show
        Rick Hillegas added a comment - Hi Kim, > "In most cases the BLOB type cannot be cast to and from other types." This is a bit vague. I think that this is just trying to summarize what's documented in Table 1. A BLOB can only be cast to another BLOB but it can be cast from other binary types. Other than that, no casting to/from BLOBs is allowed. I think that the SQL Standard allows arbitrary casting between binary types...but Derby does not support casting BLOB to [VAR] CHAR FOR BIT DATA.
        Hide
        Knut Anders Hatlen added a comment -

        I'm a bit curious about the truncation you're seeing when using binary strings. Do you have an example to share?

        The limit on binary string literals is documented here: http://db.apache.org/derby/docs/dev/ref/rrefstringlimits.html. It's 16336 hexadecimal characters, which means that the literals are limited to ~8KB. But the limit is still not small enough to explain why your files in the range 2.5KB to 5KB would be truncated. And even so, the parser is supposed to throw an exception rather than truncating the value when it encounters too long strings.

        Show
        Knut Anders Hatlen added a comment - I'm a bit curious about the truncation you're seeing when using binary strings. Do you have an example to share? The limit on binary string literals is documented here: http://db.apache.org/derby/docs/dev/ref/rrefstringlimits.html . It's 16336 hexadecimal characters, which means that the literals are limited to ~8KB. But the limit is still not small enough to explain why your files in the range 2.5KB to 5KB would be truncated. And even so, the parser is supposed to throw an exception rather than truncating the value when it encounters too long strings.
        Hide
        Kim Haase added a comment -

        I'm attaching the create.sql file used to populate the database, along with the original JPEG images that were truncated. What I don't know is how the bit strings were generated from the JPEG files. They may be truncated themselves, for all I know.

        The version of Derby used was 10.6.2.1, which comes with the Java EE SDK 6 update 3 (GlassFish 3.1.1).

        Show
        Kim Haase added a comment - I'm attaching the create.sql file used to populate the database, along with the original JPEG images that were truncated. What I don't know is how the bit strings were generated from the JPEG files. They may be truncated themselves, for all I know. The version of Derby used was 10.6.2.1, which comes with the Java EE SDK 6 update 3 (GlassFish 3.1.1).
        Hide
        Kim Haase added a comment -

        An additional data point: I tried running the create.sql file followed by a modified blobex.java (one that only retrieves the db contents), using Derby 10.8.2.2. Again I got truncated JPEGs – exactly 2000 bytes each.

        I think the bit strings may themselves be truncated, since each seems to contain about 400 hex digits?

        Show
        Kim Haase added a comment - An additional data point: I tried running the create.sql file followed by a modified blobex.java (one that only retrieves the db contents), using Derby 10.8.2.2. Again I got truncated JPEGs – exactly 2000 bytes each. I think the bit strings may themselves be truncated, since each seems to contain about 400 hex digits?
        Hide
        Knut Anders Hatlen added a comment -

        Thanks for posting the script, Kim. I think you're right that the bit strings are truncated (4000 hex digits == 2000 bytes), so Derby seems to be doing the right thing with them.

        Show
        Knut Anders Hatlen added a comment - Thanks for posting the script, Kim. I think you're right that the bit strings are truncated (4000 hex digits == 2000 bytes), so Derby seems to be doing the right thing with them.
        Hide
        Kim Haase added a comment -

        Attaching DERBY-5505.diff, DERBY-5505.stat, and DERBY-5505.zip, with the following changes:

        M src/ref/rrefblob.dita
        M src/ref/rrefclob.dita
        M src/ref/rrefsqlj33562.dita

        The patch adds a code example to the BLOB data type topic, modifies the one in the CLOB topic, and clarifies (I hope?) the language about the bit string types in the CAST function topic.

        Show
        Kim Haase added a comment - Attaching DERBY-5505 .diff, DERBY-5505 .stat, and DERBY-5505 .zip, with the following changes: M src/ref/rrefblob.dita M src/ref/rrefclob.dita M src/ref/rrefsqlj33562.dita The patch adds a code example to the BLOB data type topic, modifies the one in the CLOB topic, and clarifies (I hope?) the language about the bit string types in the CAST function topic.
        Hide
        Knut Anders Hatlen added a comment -

        Looks good to me.

        Perhaps close the result sets and statements too for good measure?

        Some suggestions that would make the code examples a little more compact, feel free to disagree/ignore:

        • If the outer try/catch is removed, and the main method is declared to throw Exception, we save one indentation level.
        • The LOB columns in the examples don't need to be declared with an explicit maximum length. Then we could also remove the comment that says we should really have checked the length of the file.
        • The examples could use the setBinaryStream()/setAsciiStream() methods that don't take a length argument, so that we don't need to fetch the length from the File object. In fact, then we wouldn't need a File object at all, and could just create the FileInputStream directly with new FileInputStream("seeds.jpg").

        (To the latter suggestion it might be objected that it would introduce a dependency on JDBC 4.0 and therefore wouldn't work on all supported platforms. I think that would be fine, though. And in any case the patch already makes other changes that would make the examples dependent on JDBC 4.0, like removing the explicit loading of the driver, and not casting the length argument to an int when calling setAsciiStream().)

        Another thought: Would it make sense to make the examples code fragments instead of full Java classes? Then we wouldn't need to waste space on boilerplate like class declarations, error handling and shutting down the database. They take a fairly big part of the examples, and may take away the focus from the things we really want to demonstrate.

        Show
        Knut Anders Hatlen added a comment - Looks good to me. Perhaps close the result sets and statements too for good measure? Some suggestions that would make the code examples a little more compact, feel free to disagree/ignore: If the outer try/catch is removed, and the main method is declared to throw Exception, we save one indentation level. The LOB columns in the examples don't need to be declared with an explicit maximum length. Then we could also remove the comment that says we should really have checked the length of the file. The examples could use the setBinaryStream()/setAsciiStream() methods that don't take a length argument, so that we don't need to fetch the length from the File object. In fact, then we wouldn't need a File object at all, and could just create the FileInputStream directly with new FileInputStream("seeds.jpg"). (To the latter suggestion it might be objected that it would introduce a dependency on JDBC 4.0 and therefore wouldn't work on all supported platforms. I think that would be fine, though. And in any case the patch already makes other changes that would make the examples dependent on JDBC 4.0, like removing the explicit loading of the driver, and not casting the length argument to an int when calling setAsciiStream().) Another thought: Would it make sense to make the examples code fragments instead of full Java classes? Then we wouldn't need to waste space on boilerplate like class declarations, error handling and shutting down the database. They take a fairly big part of the examples, and may take away the focus from the things we really want to demonstrate.
        Hide
        Kim Haase added a comment -

        Thank you for putting so much thought into this and for making those terrific suggestions, Knut. I'll work on them and file a second patch.

        Show
        Kim Haase added a comment - Thank you for putting so much thought into this and for making those terrific suggestions, Knut. I'll work on them and file a second patch.
        Hide
        Kim Haase added a comment -

        Attaching a revised patch, DERBY-5505-2.diff and DERBY-5505-2.zip. I hope I've incorporated Knut's recommendations about the BLOB and CLOB topics.

        Show
        Kim Haase added a comment - Attaching a revised patch, DERBY-5505 -2.diff and DERBY-5505 -2.zip. I hope I've incorporated Knut's recommendations about the BLOB and CLOB topics.
        Hide
        Kristian Waagan added a comment -

        Not sure where it belongs, but using the methods that don't take a length argument may cause reduced performance in some cases. Some more information at DERBY-4652.
        (caveat: I haven't looked at the latest patch, so I don't know what's already mentioned or not)

        Show
        Kristian Waagan added a comment - Not sure where it belongs, but using the methods that don't take a length argument may cause reduced performance in some cases. Some more information at DERBY-4652 . (caveat: I haven't looked at the latest patch, so I don't know what's already mentioned or not)
        Hide
        Kim Haase added a comment -

        Thanks for mentioning that performance issue, Kristian.

        I think the information should be added to the "Prepared statements and streaming columns" topic (http://db.apache.org/derby/docs/dev/ref/rrefjavsqlprst.html). I notice that the Java SE API docs (http://docs.oracle.com/javase/6/docs/api/) for PreparedStatement.setAsciiStream et al. have the following note:

        "Note: Consult your JDBC driver documentation to determine if it might be more efficient to use a version of setAsciiStream which takes a length parameter."

        I could add a note in the examples saying that we are using the 2-arg form for simplicity, but if performance is an issue the 3-arg form may be preferable. But we really need to say that in the rrefjavsqlprst topic. Do we have any information to provide about the circumstances that might affect performance?

        Show
        Kim Haase added a comment - Thanks for mentioning that performance issue, Kristian. I think the information should be added to the "Prepared statements and streaming columns" topic ( http://db.apache.org/derby/docs/dev/ref/rrefjavsqlprst.html ). I notice that the Java SE API docs ( http://docs.oracle.com/javase/6/docs/api/ ) for PreparedStatement.setAsciiStream et al. have the following note: "Note: Consult your JDBC driver documentation to determine if it might be more efficient to use a version of setAsciiStream which takes a length parameter." I could add a note in the examples saying that we are using the 2-arg form for simplicity, but if performance is an issue the 3-arg form may be preferable. But we really need to say that in the rrefjavsqlprst topic. Do we have any information to provide about the circumstances that might affect performance?
        Hide
        Kim Haase added a comment -

        Actually, I should file a separate doc issue for the additions to the "Prepared statements and streaming columns" topic and any references to it – assuming we can say something helpful about the performance issues.

        Show
        Kim Haase added a comment - Actually, I should file a separate doc issue for the additions to the "Prepared statements and streaming columns" topic and any references to it – assuming we can say something helpful about the performance issues.
        Hide
        Knut Anders Hatlen added a comment -

        The latest patch looks good to me. I'm not familiar with the performance problems of the two-arg methods, though.

        Show
        Knut Anders Hatlen added a comment - The latest patch looks good to me. I'm not familiar with the performance problems of the two-arg methods, though.
        Hide
        Kim Haase added a comment -

        Thanks, Knut. I will commit the current patch and file a separate issue if it would be useful to document the performance issues.

        Show
        Kim Haase added a comment - Thanks, Knut. I will commit the current patch and file a separate issue if it would be useful to document the performance issues.
        Hide
        Kim Haase added a comment -

        Committed patch DERBY-5505-2.diff to documentation trunk at revision 1327508.

        Show
        Kim Haase added a comment - Committed patch DERBY-5505 -2.diff to documentation trunk at revision 1327508.
        Hide
        Kim Haase added a comment -

        Changes have appeared in Latest Alpha Manuals.

        Show
        Kim Haase added a comment - Changes have appeared in Latest Alpha Manuals.

          People

          • Assignee:
            Kim Haase
            Reporter:
            Kim Haase
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development