|
Easiest way of providing buffering over normal stream is to wrap it under BufferedInputStream I am trying out that change to see if it improves significant performance improvement.
I suspect the problem is with 4k buffer size in LOBStreamControl before it moves the data to file. I think i need to make this larger. While debugging I noticed even very large clob are kept in memory so all the test case of BlobClob4BlobTest.testPositionAgressive used to operate on in memory string. Now moving it onto stream has made is slow (this also explains why Modified UTF8Reader to wrap input stream (received via constructor) in BufferedInputStream
I haven't run the complete suite but org.apache.derbyTesting.functionTests.tests.jdbcapi._Suite runs with out any failure (in 1,111.169 sec) org.apache.derbyTesting.functionTests.tests.jdbcapi.BlobClob4BlobTest now takes 151.561 sec. Thanks Anurag for looking at this while on vacation. Your patch looks good and I recommend it to be committed so that the runtime of the junit test suite gets back to normal.
As a side note, I do not understand why UTF8Reader buffers the converted characters. Would it not have been better to buffer the incoming bytes and rather convert characters on demand? Patch Committed revision 532127 - Thanks Anurag.
fwiw, I also looked at the change and it looks good. I ran the BlobClob4BlobTest on my linux box and before the change - it took 3197s, with this change it took 211s.
Anurag>"I suspect the problem is with 4k buffer size in LOBStreamControl before it moves the data to file." do you plan to make some changes related to this in some other jira issue , if so can you point me to that jira issue. Thanks. My test runs dropped from the 68mins to 39mins. Thanks Anurag and Øystein for the quick response.
This looks like it is fixed, can it be closed?
With the fix applied, the run time is back to normal.
sorry for the late response.
In Blob/clob before making the changes the data was held in memory depending on the free space in the page. So some times even huge clob/blob can be used as String. In case of testPositionAggressive all the tests were running in this mode and String.indexOf was giving result very fast. Once I changed the blob/clob to hold only 4k in memory and if it exceeds use a file it started taking very long because of the file i/o over heads. If we change the max buffer size (higher than 4k) we will be getting better performance for clob/blob of length smaller than the new value. But there may be risk of getting out of memory exception if we keep the value large and there are too many active blob and clob.I think we need not make the buffer size too large, as right now after making the stream buffered we are able to get performance comparable to the older implementation. > Once I changed the blob/clob to hold only 4k in memory and if it exceeds
> use a file it started taking very long because of the file i/o over > heads. If we change the max buffer size (higher than 4k) we will be > getting better performance for clob/blob of length smaller than the new > value. But there may be risk of getting out of memory exception if we > keep the value large and there are too many active blob and clob.I think > we need not make the buffer size too large, as right now after making > the stream buffered we are able to get performance comparable to the > older implementation. I think it is important that temporary storage is only used when the LOB is updated. Hence, when reading a LOB, one should make sure that the LOB is not written to temporary storage, regardless of size. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I guess that the reason may be that the new streams that are used do not do buffering. They rely on the caller to use the byte[] version of read if many characters are to be read.
However, UTF8Reader.fillBuffer reads one byte at a time. My first attempt at fixing this will be to change UTF8Reader to fill its buffer with the byte[] version of read. I think that is better than making the new streams do buffering since it seems unecessary to do buffering at several levels.