Issue Details (XML | Word | Printable)

Key: DERBY-326
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Tomohito Nakayama
Reporter: Kathey Marsden
Votes: 1
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Derby

Improve streaming of large objects for network server and client

Created: 31/May/05 07:38 PM   Updated: 14/Aug/09 04:34 PM
Return to search
Component/s: Network Client, Network Server
Affects Version/s: 10.1.2.1
Fix Version/s: 10.2.1.6

Time Tracking:
Issue & Sub-Tasks
Issue Only
Not Specified

File Attachments:
  Size
Zip Archive Licensed for inclusion in ASF works ClobTest.zip 2006-01-17 03:43 PM Sunitha Kambhampati 77 kB
Text File Licensed for inclusion in ASF works DERBY-326.patch 2005-12-29 11:13 PM Tomohito Nakayama 28 kB
Text File Licensed for inclusion in ASF works DERBY-326_2.patch 2005-12-30 02:46 PM Tomohito Nakayama 26 kB
Text File Licensed for inclusion in ASF works DERBY-326_3.patch 2006-01-04 05:04 PM Tomohito Nakayama 28 kB
Text File Licensed for inclusion in ASF works DERBY-326_4.patch 2006-01-07 09:34 PM Tomohito Nakayama 30 kB
Text File Licensed for inclusion in ASF works DERBY-326_5.patch 2006-01-11 10:01 PM Tomohito Nakayama 30 kB
Text File Licensed for inclusion in ASF works DERBY-326_5_indented.patch 2006-01-18 09:06 PM Tomohito Nakayama 32 kB
Text File Licensed for inclusion in ASF works DERBY-326_6.patch 2006-03-02 07:47 PM Tomohito Nakayama 37 kB
Text File Licensed for inclusion in ASF works DERBY-326_7.patch 2006-04-26 06:19 PM Tomohito Nakayama 276 kB
Text File Licensed for inclusion in ASF works DERBY-326_8.patch 2006-04-30 12:51 PM Tomohito Nakayama 278 kB
Text File Licensed for inclusion in ASF works DERBY-326_9.patch 2006-05-07 04:02 PM Tomohito Nakayama 276 kB
File Licensed for inclusion in ASF works ReEncodedInputStream.java.modifiedForLongRun 2006-01-24 09:56 PM Tomohito Nakayama 3 kB
Issue Links:
Blocker
 
Incorporates
 
Reference
 

Bug behavior facts: Performance
Resolution Date: 29/Apr/09 01:01 AM

Sub-Tasks  All   Open   

 Description  « Hide
Currently the stream writing methods in network server and client require a length parameter. This means that we have to get the length of the stream before sending it. For example in network server in EXTDTAInputStream we have to use getString and getbytes() instead of getCharacterStream and getBinaryStream so that we can get the length.


SQLAM Level 7 provides for the enhanced LOB processing to allow streaming without indicating the length, so, the writeScalarStream methods in
network server DDMWriter.java and network client Request.java can be changed to not require a length.

Code inspection of these methods seems to indicate that while the length is never written it is used heavily in generating the DSS. One strange thing is that it appears on error, the stream is padded out to full length with zeros, but an actual exception is never sent. Basically I think perhaps these methods need to be rewritten from scratch based on the spec requirements for lobs.

After the writeScalarStream methods have been changed, then EXTDAInputStream can be changed to properly stream LOBS. See TODO tags in this file for more info. I am guessing similar optimizations available in the client as well, but am not sure where that code is.





 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Tomohito Nakayama added a comment - 09/Oct/05 06:19 PM
It seems that there exist something between this issue and DERBY-609 .

Tomohito Nakayama added a comment - 09/Nov/05 02:13 AM
I just have started to read around Externalied LOB Data in Answer Set :4.4.6.3 @page170ofDRDA Spec part1 .?

Tomohito Nakayama added a comment - 12/Nov/05 02:56 PM
I found information about layer B Streaming in description of DSS, DRDA, Version 3, Volume3: page315.

This must be what we should implement .

Tomohito Nakayama added a comment - 29/Dec/05 11:13 PM
Description of patch :
   * Remove processing to expand data from InputStream of blob/clob to memory before sending to client.
    * Implement layer B streaming at NetworkServer.
     * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream.
       Here , "almost" means that code was not wrote from scratch, but was wrote as reform.
       Remarkable point is as next :
       Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment .
       Now this variable "bytesToRead" was removed from.
       New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment .
       
    * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly.
     * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it.
     * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also.
    
     
   * Modify master file of result for blobclob4BLOB.
    * Now as same as result of embed driver, dead lock will be happen in clobTest92.
    * Different expected exception was happen in negative test in blobclob4BLOB.
    
    
    
Testing :
   Executed derbyall and found no error.

Bryan Pendleton added a comment - 30/Dec/05 03:25 AM
It seems as though there are a lot of modified lines in your patch, but many of them appear to be simply indentation/whitespace changes. Is it possible that you could re-run your diff command with the whitespace
changes excluded? Perhaps something like "svn diff --diff-cmd diff -x -uw"? That would make the patch
easier to read and understand, for me. Thanks, bryan

Tomohito Nakayama added a comment - 30/Dec/05 02:31 PM
Create patch file executing next command.
   svn diff --diff-cmd diff -x -uw > DERBY-326_2.patch

Description of patch :
   * Remove processing to expand data from InputStream of blob/clob to memory before sending to client.
    * Implement layer B streaming at NetworkServer.
     * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream.
       Here , "almost" means that code was not wrote from scratch, but was wrote as reform.
       Remarkable point is as next :
       Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment .
       Now this variable "bytesToRead" was removed from.
       New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment .
       
    * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly.
     * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it.
     * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also.
    
     
   * Modify master file of result for blobclob4BLOB.
    * Now as same as result of embed driver, dead lock will be happen in clobTest92.
    * Different expected exception was happen in negative test in blobclob4BLOB.
    
    
    
Testing :
   * Executed derbyall and found no error.
   * After DERBY-326_2.patch created,
      applying DERBY-326_2.patch to reverted source code and
      confirmed applied source codes can be build.

Tomohito Nakayama added a comment - 30/Dec/05 02:46 PM
I re-upload DERBY-326_2.patch.
I created patch *after* applying patch created executing "svn diff --diff-cmd diff -x -uw".

Description of patch :
   * Remove processing to expand data from InputStream of blob/clob to memory before sending to client.
    * Implement layer B streaming at NetworkServer.
     * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream.
       Here , "almost" means that code was not wrote from scratch, but was wrote as reform.
       Remarkable point is as next :
       Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment .
       Now this variable "bytesToRead" was removed from.
       New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment .
       
    * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly.
     * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it.
     * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also.
    
     
   * Modify master file of result for blobclob4BLOB.
    * Now as same as result of embed driver, dead lock will be happen in clobTest92.
    * Different expected exception was happen in negative test in blobclob4BLOB.
    
    
    
Testing :
   * Executed derbyall and found no error.

Tomohito Nakayama added a comment - 04/Jan/06 05:04 PM
 I upload DERBY-326_3.patch.
 -----
 Description of patch :
    * Remove processing to expand data from InputStream of blob/clob to memory before sending to client.
     * Implement layer B streaming at NetworkServer.
      * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream.
        Here , "almost" means that code was not wrote from scratch, but was wrote as reform.
        Remarkable point is as next :
        Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment .
        Now this variable "bytesToRead" was removed from.
        New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment .
      * Add call to ensureLength in writeScalarStream expecting appropriate buffer size.
      * Move comment in java/drda/org/apache/derby/impl/drda/DDMWriter.java about client driver implementation
         to java/client/org/apache/derby/client/net/Request.java.
         
     * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly.
      * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it.
      * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also.
      
       
    * Modify master file of result for blobclob4BLOB.
     * Now as same as result of embed driver, dead lock will be happen in clobTest92.
     * Different expected exception was happen in negative test in blobclob4BLOB.
      
      
      
 Testing :
    * Executed derbyall and found no error other than DERBY-792.

Tomohito Nakayama added a comment - 08/Jan/06 08:03 PM
 I upload DERBY-326_4.patch.
  -----
  Description of patch :
     * Remove processing to expand data from InputStream of blob/clob to memory before sending to client.
      * Implement layer B streaming at NetworkServer.
       * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream.
         Here , "almost" means that code was not wrote from scratch, but was wrote as reform.
         Remarkable point is as next :
         Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment .
         Now this variable "bytesToRead" was removed from.
         New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment .
       * Add call to ensureLength in writeScalarStream expecting appropriate buffer size.
       * Move comment in java/drda/org/apache/derby/impl/drda/DDMWriter.java about client driver implementation
          to java/client/org/apache/derby/client/net/Request.java.
           
      * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly.
       * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it.
       * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also.
         
     * Modify master file of result for blobclob4BLOB.
      * Now as same as result of embed driver, dead lock will be happen in clobTest92.
      * Different expected exception was happen in negative test in blobclob4BLOB.
        
     * Added asserting code.

        
  Testing :
     * Executed derbyall and did not found new error.

Tomohito Nakayama added a comment - 11/Jan/06 10:28 PM
 I upload DERBY-326_5.patch.
   -----
   Description of patch :
      * Remove processing to expand data from InputStream of blob/clob to memory before sending to client.
       * Implement layer B streaming at NetworkServer.
        * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream.
          Here , "almost" means that code was not wrote from scratch, but was wrote as reform.
          Remarkable point is as next :
          Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment .
          Now this variable "bytesToRead" was removed from.
          New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment .
        * Add call to ensureLength in writeScalarStream expecting appropriate buffer size.
        * Move comment in java/drda/org/apache/derby/impl/drda/DDMWriter.java about client driver implementation
           to java/client/org/apache/derby/client/net/Request.java.
             
       * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly.
        * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it.
        * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also.
           
      * Modify master file of result for blobclob4BLOB.
       * Now as same as result of embed driver, dead lock will be happen in clobTest92.
       * Different expected exception was happen in negative test in blobclob4BLOB.
          
      * Added asserting code.
 
          
   Testing :
      * Executed derbyall for 6 times and confirm DERBY-792 does not happen in the results.

Kathey Marsden added a comment - 14/Jan/06 04:13 AM
I am reviewing this patch. I am sorry for the long delay after the holidays. I should be able to post comments Monday. I am still working to understand the changes.

 A few general items

1) Javadoc - I think it would be good to add more javadoc even with the briefest description of the method function and parameters.

2) Tests - You had asked at one point I think about a location for tests. I think largedata/loblengthTests could be modified to use a stream and then the jvmflags line removed.

3) Code Formatting - There seem to be code indentation inconsistencies in DDMWriter. Also I think it is good to indent the method bodies in the new methods in ReEncodedInputStream.java. Things sort of run together.


Thanks

Kathey

Kathey Marsden added a comment - 17/Jan/06 02:14 AM
I think this patch is a big improvement over reading the lobs into memory.
A few comments on future improvement and some questions on the code below:


Possible future improvements:

- for VARCHAR FOR BIT DATA (byte[] values) we still call writeScalarStream with length and do not do layerBStreaming. Would it make sense to stream this as well and get rid of all the code for handling extended length etc?

reencode() performance
This method is called every 1024 characters of a character stream and
-- creates a localBuffer char[1024] to read the character stream data
-- creates a new String(localBuffer)
-- creates a new byte[] with the getBytes() call
-- creates a new ByteArrayInputStream

A simple optimization using your current scheme might be to have global field byte[BUFFERED_CHAR_LEN] instead of making a new one each time. Perhaps also a single OutputStreamWriter field wrapped around a ByteArrayOutputStream which is reset on each reencode call could eliminate the String creation.

Long term we should consider the fact that Derby stores the data in UTF8 format, getCharacterStream decodes it then network server reencodes it to UTF8.
DERBY-760 is meant to address this issue, plus hopefully eliminate the additional copy into the buffer in DDMWriter. I will add more comments to DERBY-760.



QUESTIONS on code:

DDMWriter - I always had a hard time understanding the stream code in this file , but found it much easier with your single loop vs the triple loop of the old code. But I have some questions.

- Why do we need this ensureLength call for layerBstreaming in prepScalarStream?
      ensureLength( layerBstreaming ?
DEFAULT_BUFFER_SIZE - offset :
length );

- Can you add a comment to explain the calculation for prepScalarStream return value, especially what the 6 and 4 are?
return DssConstants.MAX_DSS_LENGTH - 6 - 4 - nullIndicatorSize - extendedLengthByteCount


writeScalarStream

Could you verify that my analysis of this code is correct?
while( !isLastSegment ){
/*
* This loop writes and flushes the data. If there is more data than will
* fit in the remaining buffer space or will fit in the current DSS, the
* data will be continued in the next DSS segment via flushScalarStreamSegment.
*
*/
int spareBufferLength = bytes.length - offset;
[snip more of writeScalarStream ]




// Adjust spareDssLength for 2 byte continuation flags
//written by flushScalarStreamSegment
if( ! isLastSegment )
spareDssLength = DssConstants.MAX_DSS_LENGTH - 2;

}

}

Kathey Marsden added a comment - 17/Jan/06 03:07 AM
Sorry, one more question.
Could you explain the error handling in the new scheme? The old code would call padScalarStreamForError and that is now gone, but I am not sure I understand how the error handling is handled now.


Sunitha Kambhampati added a comment - 17/Jan/06 03:43 PM
Thanks Tomohito for the patch. I wanted to try a simple clob test with reads and see if there was any difference performance wise. But I could not apply the latest patch cleanly maybe because some changes were submitted today. I have attached a zip file with the test file and a script to run it. Currently the test inserts clob of size ~500k and then does read of all the rows in the table. It would be great if you could run it before and after your changes to see the difference. Please feel free to make changes to the test or change the parameters to the test. Thanks.

Kathey Marsden added a comment - 18/Jan/06 12:02 AM
TomohitoNakayama wrote:

>
> I answer your questions.


Thank you for your answers. Any of your explanations that can be translated into code comments would be greatly appreciated.

>> Could you explain the error handling in the new scheme? The old code would call padScalarStreamForError and that is now gone, but I am not sure I understand how the error handling is handled now.
>>
> Answer is that streaming is discontinued and exception is sent using agent.markCommunicationFailure as same as other place.
>
OK. I think we should then add to the future list to throw an SQLException for such cases instead of terminating the connection. The client's Request.java rendition of writeScalarStream seems to handle the exception cases without terminating the connection, but it looks like Network Server previously could pad out the value with incorrect data and return. The change to terminate the connection is probably better than that, but certainly not ideal.


Tomohito Nakayama added a comment - 18/Jan/06 09:06 PM
I have indented methods in DDMWriter, writeScalarStream * 2 , prepScalarStream, flushScalarStreamSegment in DERBY-326_5_indented.patch.
Please have a glance on it.

I found that the source code applied this patch is easier to read, however, the patch file itself is more difficult to read which line was modified.

I again propose to submit patch for indenting, separated from this issue...

Tomohito Nakayama added a comment - 18/Jan/06 09:47 PM
I tried ClobTest from Sunitha.
Next is the result.

before patch:
naka@rufelza:~/derby/test/20060118/ClobTest$ ./run.ksh
=====================================================
TestRun =1
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =3
Populating data
Data length inserted into clob = 537638
Avg time for test[1 ]=1538.0ms
=====================================================
=====================================================
TestRun =2
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =3
Avg time for test[2 ]=1515.1ms
=====================================================
Throw away the result of the first run
================OUTPUT=============================
Avg time taken to read 3rows+ (ignore first run )=1515ms


after patch:
=====================================================
TestRun =1
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =3
Populating data
Data length inserted into clob = 537638
Avg time for test[1 ]=646.4ms
=====================================================
=====================================================
TestRun =2
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =3
Avg time for test[2 ]=627.0ms
=====================================================
Throw away the result of the first run
================OUTPUT=============================
Avg time taken to read 3rows+ (ignore first run )=627ms


I think I could find the improvement :)

Sunitha Kambhampati added a comment - 19/Jan/06 02:30 AM
Thanks Tomohito for trying out the test.

Too many short lived objects will slow down performance and I dont think it will show up for such a short test run.

In this case, I think it is a good idea to have each testrun run for atleast 1000 iterations or so with more rows in the table ( atleast 100, if not 1000 rows in this case). You can find this in the run.ksh that I attached in the ClobTest.zip.

Can you please maybe try a long run and share the results. Thank you.

Kathey Marsden added a comment - 20/Jan/06 02:46 AM
Great news about the performance indications.
 Looking again at the patch, I have some error handling concerns. I wonder if we can write negative tests for some of these negative cases.

- EXTDTAInputStream
openInputStreamLazily catches SQLExceptions/IOExceptions and throws an IllegalStateException: I think this will mean that if there is an SQLException on blob.getBinaryStream or clob.getCharacterStream. we may either terminate the connection or hang.

-Several places SQLExceptions without SQLState. e.g
....
 }catch (IOException e) {
 throw new SQLException (e.getMessage())
}
 Some other network server code imports org.apache.derby.impl.jdbc.Util; and uses throw throw Util.javaException(e); which provide an SQLState.

- On the formatting the new patch actually removes indentation for statements within method/constructor bodies in DDMWriter, so I would prefer the old one. To indent, In my IDE (Eclipse) I check the box Indent "Statements within method/constructor body". We have no code formatting guidelines even for new code. I will remove any objection on this count.


Kathey Marsden added a comment - 21/Jan/06 03:54 PM
In the email:
http://mail-archives.apache.org/mod_mbox/db-derby-dev/200601.mbox/%3c43D01EA1.2000708@amberpoint.com%3e
Bryan asked
What sort of faults are we likely to hit? Are things
like "corrupt page in database" the most likely?

I don't know what is most likely, but I did think of one test case likely to cause an IOException.
If the reader is in TRANSACTION_READ_UNCOMMITTED isolation mode and then another connection updated the LOB, the reader should get an IOException. on the next read.

I couldn't think of how to get an error on getCharacterStream/getBinaryStream but I wonder, could the work done in openInputStreamLazily() just be part of readInputStreamInitially so that these calls could just throw an SQLException if encountered?

Tomohito Nakayama added a comment - 23/Jan/06 07:26 PM
I got the result of long run.

before patch:
=====================================================
TestRun =1
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Populating data
Data length inserted into clob = 537638
Avg time for test[1 ]=20912.137ms
=====================================================
=====================================================
TestRun =2
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Avg time for test[2 ]=20772.027ms
=====================================================
=====================================================
TestRun =3
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Avg time for test[3 ]=20782.217ms
=====================================================
=====================================================
TestRun =4
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Avg time for test[4 ]=20768.951ms
=====================================================
Throw away the result of the first run
================OUTPUT=============================
Avg time taken to read 100rows+ (ignore first run )=20774ms

after patch:
=====================================================
TestRun =1
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Populating data
Data length inserted into clob = 537638
Avg time for test[1 ]=20826.22ms
=====================================================
=====================================================
TestRun =2
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Avg time for test[2 ]=20934.18ms
=====================================================
=====================================================
TestRun =3
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Avg time for test[3 ]=20935.023ms
=====================================================
=====================================================
TestRun =4
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Avg time for test[4 ]=20943.951ms
=====================================================
Throw away the result of the first run
================OUTPUT=============================
Avg time taken to read 100rows+ (ignore first run )=20937ms

The results of long run show that there exists a little fall in the speed after patch against short run.
I think this may be due to "too many short lived objects" in ReEncodedInputStream
which Sunitha and Kathey talked about.

I will try the long run again after revision.

Tomohito Nakayama added a comment - 23/Jan/06 10:01 PM
DERBY-760 will improve creation of streamed object.

Tomohito Nakayama added a comment - 24/Jan/06 09:56 PM
This is a result of long run with modification of ReEncodedInputStream.java not to create buffer for each time.
Attached ReEncodedInputStream.java.modifiedForLongRun is ReEncodedInputStream after the modification.

=====================================================
TestRun =1
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Populating data
Data length inserted into clob = 537638
Avg time for test[1 ]=20661.984ms
=====================================================
=====================================================
TestRun =2
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Avg time for test[2 ]=20764.193ms
=====================================================
=====================================================
TestRun =3
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Avg time for test[3 ]=20772.66ms
=====================================================
=====================================================
TestRun =4
 Data from file to insert into Clob. = file_500k.txt
 buffer (k) =1
 read blocks of data =false
 rows in table =100
Avg time for test[4 ]=20760.312ms
=====================================================
Throw away the result of the first run
================OUTPUT=============================
Avg time taken to read 100rows+ (ignore first run )=20765ms

I found that it is a little faster than before patch.
However not so much ...

Tomohito Nakayama added a comment - 02/Mar/06 07:47 PM
 I upload DERBY-326_6.patch.
---
    Description of patch :
       * Remove processing to expand data from InputStream of blob/clob to memory before sending to client.
        * Implement layer B streaming at NetworkServer.
         * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream.
           Here , "almost" means that code was not wrote from scratch, but was wrote as reform.
           Remarkable point is as next :
           Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment .
           Now this variable "bytesToRead" was removed from.
           New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment .
         * Add call to ensureLength in writeScalarStream expecting appropriate buffer size.
         * Move comment in java/drda/org/apache/derby/impl/drda/DDMWriter.java about client driver implementation
            to java/client/org/apache/derby/client/net/Request.java.
               
        * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly.
         * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it.
         * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also.
             
       * Modify master file of result for blobclob4BLOB.
        * Now as same as result of embed driver, dead lock will be happen in clobTest92.
        * Different expected exception was happen in negative test in blobclob4BLOB.
            
       * Added asserting code.
       * Added negative test to kill streaming.

       * Other improvements from DERBY-326_5.patch were as next.
        * Reusing objects refered from instance variable of ReEncodedInputStream.
        * Modified not to open InputStream lazily as before.

 Testing :
       * Executed derbyall and found no error except for found in http://www.multinet.no/~solberg/public/Apache/index.html

---
I have not measured the resut in performance yet.
I will report it later.

Tomohito Nakayama added a comment - 15/Mar/06 08:20 PM
I have measured the performance of streaming.

Measurement was as next.
Next 3 type of test execution was measured in both of blob and clob before and after applying patch.

type1 std
As same as shortrun in DERBY-872

type2 stability
As same as longrun in DERBY-872

type3 volume
Larger file was used in shortrun test in DERBY-872.
File size was 2.5meg for clob and 5meg for blob.
* In this test , I found problem that test was not finished if file was larger than these volume.
   I have not surveyed why streaming was not finished.
   However, seeing this problem was found in both of before and after, I think I can ignore this phenomena for now...

Next is the result:
==> ./before/blob/std/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=207ms
==> ./before/blob/stability/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=6418ms
==> ./before/blob/volume/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=806ms

==> ./after/blob/std/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=173ms
==> ./after/blob/stability/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=6453ms
==> ./after/blob/volume/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=992ms


==> ./before/clob/std/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=1511ms
==> ./before/clob/stability/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=20968ms
==> ./before/clob/volume/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=10986ms

==> ./after/clob/std/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=594ms
==> ./after/clob/stability/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=20235ms
==> ./after/clob/volume/result.txt <==
Avg time taken to read nullrows+ (ignore first run )=1963ms


Improvement was found in all of clob streaming.
However, improvement was not found in blob streaming other than std test.
Even the little drops in performance were found in stability and volume test, on the contrary.


I guesses this result as next.

Before the patch, stream retrieved from blob was buffered to memory entirely.
After the patch, no buffer is used when stream of blob was read.
Stream retrieved from blob was directly be read.

Not using buffer when reading stream of blob may be reason why streaming of blob was not improved.
In the case of clob, ReEncodedStream was used when stream of clob was read.
I think ReEncodedStream works as buffer luckily and result in improvement of streaming clob.

I think buffering is needed when reading stream of blob too.

Under this considerration,
I will implement code of buffering *each segment* of stream retrieved from blob.

Tomohito Nakayama added a comment - 30/Mar/06 11:31 PM
I have surveyed around the drop in performance of streaming blob after the patch.

After struggling to interpret the profiled informations,
I found that java.net.SocketOutputStream.writemethod seems to takes much more part in processing than before the patch.

I took this as *sign* of problem at client side and
realized that client side handles buffer without information for length of information to be cached after the patch and
that it would be reason of drop in performance.

Then I tested with modified client driver that have buffer of 5 * 1024 * 1024 bytes, corresponding size of the streamed file.

//Where I specified the size of buffer was next two part.
//* org.apache.derby.client.am.Connection.commBufferSize_ and
//* parameter for constructor of ByteArrayOutputStream called from org.apache.derby.client.net.Reply.getData when ddmScalarLen_ is -1.

Result was as next.
Avg time taken to read nullrows+ (ignore first run )=760ms

This is faster than result before the patch.
==> ./before/blob/volume/result.txt <==
 Avg time taken to read nullrows+ (ignore first run )=806ms

This result suggests that problem around buffer at client is the reason why the patch does not improve streaming blob.
Now I think it is the reason of drop in performance of streaming blob
that buffer at client side is handled without information of length to be buffered after the patch.


Tomohito Nakayama added a comment - 04/Apr/06 08:50 PM
I give next three countermeasure for drop in performance of streaming blob in the patch.

1:Ignore the drop in streaming blob. The degree of drop is not so much.
2:Give up using layer B streaming.
3:Use both layer B streaming and traditional way. Layer B streaming for clob and traditional way for blob.

I think 3rd one is best one and work forward it.

Kathey Marsden added a comment - 06/Apr/06 01:49 AM
Does "traditional way" mean we will still have hte large object materialized into memory on the server?

Kathey Marsden added a comment - 07/Apr/06 12:19 AM
The response by mail was that yes traditional way for blob meant that BLOB's would still be materialized into memory on the server. The description of the performance degredation for BLOB's made it sound like it was not severe. I think a slight performance degradation for BLOB's is an exceptable trade off for not having them materialized into memory.
 

Tomohito Nakayama added a comment - 10/Apr/06 09:59 PM
I resumed issues on next mail.
http://mail-archives.apache.org/mod_mbox/db-derby-dev/200603.mbox/&lt;440C3DDE.3080306%40basil.ocn.ne.jp>

For Issue 1) and 2) , I added next ASSERT code and execute derbyall.
+ protected void writeScalarStream (boolean chainedWithSameCorrelator,
  int codePoint,
  int length,
  java.io.InputStream in,
  boolean writeNullByte)
  throws DRDAProtocolException
  {
+
+ // Test code to confirm InputStream is always EXTDTAInputStream
+ if( SanityManager.DEBUG ){
+ SanityManager.ASSERT( in instanceof EXTDTAInputStream );
+ }
+

The result was that I found no error resulted from this ASSERT code.

Considering this result, I think it would be possible to remove code for byte[] in DRDAConnThread.writeEXTDTA (DRDAStatement stmt), assuming stream is always EXTDTAInputStream object.

Please give me comments from other members.

Tomohito Nakayama added a comment - 10/Apr/06 10:11 PM

Tomohito Nakayama added a comment - 12/Apr/06 08:52 PM
There seems to be neither objection nor approval for removing the code for byte[]...

Well... I will remove it.

The reason is as next.
1) The unused code should be removed.
2) Even if it was not removed, that part was not tested in test suite and may not work as expected.

I think this is not bad challenge...

Tomohito Nakayama added a comment - 26/Apr/06 06:19 PM
 I upload DERBY-326_7.patch.
 ---
     Description of patch :
        * Remove processing to expand data from InputStream of blob/clob to memory before sending to client.
         * Implement layer B streaming at NetworkServer.
          * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream.
            Here , "almost" means that code was not wrote from scratch, but was wrote as reform.
            Remarkable point is as next :
            Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment .
            Now this variable "bytesToRead" was removed from.
            New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment .
          * Add call to ensureLength in writeScalarStream expecting appropriate buffer size.
          * Move comment in java/drda/org/apache/derby/impl/drda/DDMWriter.java about client driver implementation
             to java/client/org/apache/derby/client/net/Request.java.
                 
         * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly.
          * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it.
          * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also.
               
        * Modify master file of result for blobclob4BLOB.
         * Now as same as result of embed driver, dead lock will be happen in clobTest92.
         * Different expected exception was happen in negative test in blobclob4BLOB.
              
        * Added asserting code.
        * Added negative test to kill streaming.
 
        * Place buffer before lob object was streamed out to client.
        * Added test to stream with the stream out buffer configuration.

        * Other improvements from DERBY-326_6.patch were as next.
         * Remove the code for handling byte[].
         * Make EXTDTAInputStream always be markSupported.
 
  Testing :
        * Executed derbyall and found no error except found in http://www.multinet.no/~solberg/public/Apache/DerbyJvm1.4/Limited/testSummary-396619.html

Bryan Pendleton added a comment - 29/Apr/06 06:33 AM
As you know, I have studied this patch multiple times; I think it is excellent.

Two thoughts, if you have not yet committed the patch:

1) It would be nice if you could add some comments to OutBufferedStream.java indicating the purpose of this test.

2) Should we add some documentation about the new configuration property derby.drda.streamOutBufferSize?

Thank you very much for all the great work on this issue!

Tomohito Nakayama added a comment - 29/Apr/06 03:07 PM
Working in DERBY-1268, I found that svn:eol-style was not native in next files which was added in this task.

added files:
java/drda/org/apache/derby/impl/drda/ReEncodedInputStream.java
java/testing/org/apache/derbyTesting/functionTests/master/DerbyNetClient/SuicideOfStreaming.out
java/testing/org/apache/derbyTesting/functionTests/master/DerbyNet/SuicideOfStreaming.out
java/testing/org/apache/derbyTesting/functionTests/master/OutBufferedStream.out
java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/OutBufferedStream_app.properties
java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/OutBufferedStream.java
java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/SuicideOfStreaming_app.properties
java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/SuicideOfStreaming.java
java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/SwitchablePrintStream.java

.subversion/config was not configured in my environment ...

I set attributes to those files in next patch.

Tomohito Nakayama added a comment - 30/Apr/06 12:51 PM
 I upload DERBY-326_8.patch.
  ---
      Description of patch :
         * Remove processing to expand data from InputStream of blob/clob to memory before sending to client.
          * Implement layer B streaming at NetworkServer.
           * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream.
             Here , "almost" means that code was not wrote from scratch, but was wrote as reform.
             Remarkable point is as next :
             Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment .
             Now this variable "bytesToRead" was removed from.
             New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment .
           * Add call to ensureLength in writeScalarStream expecting appropriate buffer size.
           * Move comment in java/drda/org/apache/derby/impl/drda/DDMWriter.java about client driver implementation
              to java/client/org/apache/derby/client/net/Request.java.
                   
          * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly.
           * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it.
           * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also.
                 
         * Modify master file of result for blobclob4BLOB.
          * Now as same as result of embed driver, dead lock will be happen in clobTest92.
          * Different expected exception was happen in negative test in blobclob4BLOB.
                
         * Added asserting code.
         * Added negative test to kill streaming.
   
         * Place buffer before lob object was streamed out to client.
         * Added test to stream with the stream out buffer configuration.
        
         * Remove the code for handling byte[].

        * Other improvements from DERBY-326_7.patch were as next.
         * Add comment to explain the test of OutBufferedStream.java
         * Set svn:eol-style property of newly added files as native.

  Testing :
         * Executed derbyall and found no error except found in http://www.multinet.no/~solberg/public/Apache/DerbyJvm1.4/Limited/testSummary-397966.html

Tomohito Nakayama added a comment - 06/May/06 04:58 PM
I found difference in .out file of SuicideOfStreaming in execution of derbyall,
after updated to repository .

********* Diff file derbyall/derbynetclientmats/DerbyNetClient/derbynetmats/derbynetmats/SuicideOfStreaming.diff
*** Start: SuicideOfStreaming jdk1.4.2_10 DerbyNetClient derbynetmats:derbynetmats 2006-05-06 17:37:39 ***
1 del
< java.sql.SQLException: A communication error has been detected. Communication protocol being used: Reply.fill(). Communication API being used: InputStream.read(). Location where the error was detected: insufficient data. Communication function detecting the error: *. Protocol specific error codes(s) TCP/IP SOCKETS
2 del
< Caused by: org.apache.derby.client.am.DisconnectException: A communication error has been detected. Communication protocol being used: Reply.fill(). Communication API being used: InputStream.read(). Location where the error was detected: insufficient data. Communication function detecting the error: *. Protocol specific error codes(s) TCP/IP SOCKETS
2a1,2
> java.sql.SQLException: Insufficient data while reading from the network - expected a minimum of 6 bytes and received only -1 bytes. The connection has been terminated.
> Caused by: org.apache.derby.client.am.DisconnectException: Insufficient data while reading from the network - expected a minimum of 6 bytes and received only -1 bytes. The connection has been terminated.
Test Failed.
*** End: SuicideOfStreaming jdk1.4.2_10 DerbyNetClient derbynetmats:derbynetmats 2006-05-06 17:37:44 ***

I guess that this difference may be caused by "DERBY-850 - Internationalize client/org/apache/derby/jdbc classes.".
I think this difference is no problem and willing to adopt new error messages in .out file.

I suspend committing this patch for days to listen comments from others.

Tomohito Nakayama added a comment - 07/May/06 04:02 PM
 I upload DERBY-326_9.patch.
   ---
       Description of patch :
          * Remove processing to expand data from InputStream of blob/clob to memory before sending to client.
           * Implement layer B streaming at NetworkServer.
            * As written in this issue firstly, almost rewrite whole org.apache.derby.impl.drda.DDMWriter#writeScalarStream.
              Here , "almost" means that code was not wrote from scratch, but was wrote as reform.
              Remarkable point is as next :
              Original code was using variable "bytesToRead" to handle remaining amount of data sent and remaining roon in DSS segment .
              Now this variable "bytesToRead" was removed from.
              New code, instead, have variable "spareDssLength" to handle remaining room in DSS segment .
            * Add call to ensureLength in writeScalarStream expecting appropriate buffer size.
            * Move comment in java/drda/org/apache/derby/impl/drda/DDMWriter.java about client driver implementation
               to java/client/org/apache/derby/client/net/Request.java.
                     
           * Modify org.apache.derby.impl.drda.EXTDTAInputStream to stream InputStream retrieved from ResultSet directly.
            * The source stream is read twice, first for seeing whether source stream is/is not empty, second for streaming it.
            * To keep reference to valid stream, EXTDTAInputStream have reference to resultset and blob/clob also.
                   
          * Modify master file of result for blobclob4BLOB.
           * Now as same as result of embed driver, dead lock will be happen in clobTest92.
           * Different expected exception was happen in negative test in blobclob4BLOB.
                  
          * Added asserting code.
          * Added negative test to kill streaming.
     
          * Place buffer before lob object was streamed out to client.
          * Added test to stream with the stream out buffer configuration.
          
          * Remove the code for handling byte[].
 
         * Other improvements from DERBY-326_8.patch were as next.
          * Modify result of .out file for SuicideOfStreaming.java as to match current.
 
   Testing :
          * Executed derbyall and found no error

Tomohito Nakayama added a comment - 07/May/06 04:05 PM
One more modification between
DERBY-326_8.patch and DERBY-326_9.patch ...

* Conflict in both of blobclob4BLOB.out was resolved.

Tomohito Nakayama added a comment - 08/May/06 07:38 PM
I have committed.

Sending java/drda/org/apache/derby/impl/drda/DDMWriter.java
Sending java/drda/org/apache/derby/impl/drda/DRDAConnThread.java
Sending java/drda/org/apache/derby/impl/drda/EXTDTAInputStream.java
Adding java/drda/org/apache/derby/impl/drda/ReEncodedInputStream.java
Sending java/engine/org/apache/derby/iapi/reference/Property.java
Adding java/testing/org/apache/derbyTesting/functionTests/master/DerbyNet/SuicideOfStreaming.out
Sending java/testing/org/apache/derbyTesting/functionTests/master/DerbyNet/blobclob4BLOB.out
Adding java/testing/org/apache/derbyTesting/functionTests/master/DerbyNetClient/SuicideOfStreaming.out
Sending java/testing/org/apache/derbyTesting/functionTests/master/DerbyNetClient/blobclob4BLOB.out
Adding java/testing/org/apache/derbyTesting/functionTests/master/OutBufferedStream.out
Sending java/testing/org/apache/derbyTesting/functionTests/suites/derbynetmats.runall
Adding java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/OutBufferedStream.java
Adding java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/OutBufferedStream_app.properties
Adding java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/SuicideOfStreaming.java
Adding java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/SuicideOfStreaming_app.properties
Adding java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/SwitchablePrintStream.java
Sending java/testing/org/apache/derbyTesting/functionTests/tests/derbynet/dataSourcePermissions_net.java
Transmitting file data .................
Committed revision 405037.

Kathey Marsden added a comment - 29/Apr/09 01:01 AM
It looks like these improvements were made in 10.2. New issues can be opened up for any additional streaming improvements.