Issue Details (XML | Word | Printable)

Key: DERBY-683
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Deepa Remesh
Reporter: Sunitha Kambhampati
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Derby

Use correct encoding for ClobOutputStream on client

Created: 05/Nov/05 10:55 AM   Updated: 12/Jul/06 01:49 AM
Return to search
Component/s: Network Client
Affects Version/s: 10.1.1.0, 10.1.2.1
Fix Version/s: 10.1.3.1, 10.2.1.6

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works ascii.txt 2006-01-28 10:35 AM Deepa Remesh 0.1 kB
Java Source File Licensed for inclusion in ASF works clob.java 2006-02-02 05:26 AM Deepa Remesh 4 kB
File Licensed for inclusion in ASF works derby-683.diff 2006-02-02 05:26 AM Deepa Remesh 2 kB
Text File Licensed for inclusion in ASF works derby-683_021006.diff 2006-02-11 08:34 AM Myrna van Lunteren 14 kB
Text File Licensed for inclusion in ASF works derby-683_021006.stat 2006-02-11 08:34 AM Myrna van Lunteren 0.7 kB
File Licensed for inclusion in ASF works derby-683_tests.diff 2006-02-03 01:16 PM Deepa Remesh 13 kB
File Licensed for inclusion in ASF works derby-683_tests.status 2006-02-03 01:16 PM Deepa Remesh 0.7 kB
Text File Licensed for inclusion in ASF works DERBY-683_tstpatch3_2006_02_16.diff 2006-02-18 01:40 AM Myrna van Lunteren 17 kB
Text File Licensed for inclusion in ASF works DERBY-683_tstpatch3_2006_02_16.stat 2006-02-18 01:40 AM Myrna van Lunteren 0.7 kB
Environment: all
Issue Links:
Blocker
 
Reference
 

Resolution Date: 18/May/06 05:34 AM


 Description  « Hide
In client, there is code in ClobOutputStream which uses this api - new String(byte[]). Per the java api http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#String(byte[]) ,this will construct a string by decoding the array of bytes using the platform's default character set.

org.apache.derby.client.am.ClobOutputStream is used for Clob.setAsciiStream and the write methods use the String(byte[]) which is incorrect because it will use the default platform encoding. Per the jdbcapi , this should use ascii encoding.

In areas related to Clobs, also check for other places where String(byte[]) is used,as it may not be the desired behavior.

Dan pointed this problem here : http://issues.apache.org/jira/browse/DERBY-463?page=comments#action_12356742

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Deepa Remesh added a comment - 28/Jan/06 10:35 AM
The patch for this seems to be a small change but I am having trouble adding a test for this. I have been trying for some time to write a test for verifying that ascii encoding gets used when writing with OutputStream returned by Clob.setAsciiStream. I do not know how to add a test which can run with the test harness because the test will need to set the property "file.encoding". I have been able to verify my changes using a java program which uses a test in jdbcapi/lobStreams.java. I am attaching the repro program and my patch (not ready for commit) to this JIRA.
 
To run the repro,
1. Start network server on port 2222.
2. Using Sun JDK1.5, run: java -Dfile.encoding=UTF-16 clob

Output without my patch:
   S t a r t t e s t C l o b W r i t e 3 P a r a m
 F A I L - - w r o n g c l o b l e n g t h ; o r i g i n a l : 9 4 c l o b l e n g t
 h : 4 7
 t e s t C l o b W r i t e 3 P a r a m f i n i s h e d

Output with my patch:
   S t a r t t e s t C l o b W r i t e 3 P a r a m
 C o m p a r i n g f i l e a n d c l o b
 T e s t 3 - P A S S - C l o b a n d f i l e c o n t e n t s m a t c h
 t e s t C l o b W r i t e 3 P a r a m f i n i s h e d

The output of the repro is not in a very good format as you can see above but this is the only charset I could find to repro this problem on my Windows machine. I have a few questions and I'd appreciate if someone can answer them.

1. Does my test cover the problem described in the JIRA?

2. Is there some other charset I can use instead of UTF-16? I need a charset which will have different encoding for any ascii character than the ASCII encoding? I am using a Windows machine. To add to test harness, we will need a charset which is available on all platforms.

3. How can I run this repro as a test inside the test harness? Test harness seems to treat file.encoding property differently and I see RunTest using it to set "derby.ui.codeset". In my repro, I tried using "System.setProperty("file.encoding", "UTF-16");" but that did not work. Only command-line "-Dfile.encoding" works.

4. Is there some other way to test this? Maybe there is another way to test this and then rest of my questions are invalid.

I'd appreciate help in this. Thanks.

Deepa Remesh added a comment - 02/Feb/06 05:26 AM
Attaching a patch 'derby-683.diff'.

The write methods in ClobOutputStream were using default encoding when constructing a String from bytes. ClobOutputStream is the output stream returned by call to Clob.setAsciiStream. This is meant to be a stream to which ascii encoded characters can be written. So the writes to this stream should not be using default encoding. This patch changes the write methods to use String constructors with ascii encoding.

Currently, I have a repro to test this. I am working on adding a test to the harness. This requires some changes to the test harness and I would like to submit this as a separate patch. To test this using the repro, run the following command on Windows with Sun JDK1.5:

java -Dfile.encoding=UTF-16 -Doutput.encoding=Cp1252 clob

With this patch, I have run derbyall on Windows with Sun JDK 1.4.2. No failures. It would be good if someone can look at this patch and commit the code changes if they are okay. I will submit the harness changes and a test in a separate patch.

Kathey Marsden added a comment - 03/Feb/06 02:03 AM
Thanks Deepa, the patch looks good. I think the harness problem is a hard one, especially for useprocess=false. Post what you have and I am sure folks will help out. We all want you done with this and back on DERBY-210 #:)

Deepa Remesh added a comment - 03/Feb/06 01:16 PM
I am attaching a patch 'derby-683_tests.diff' which adds testing for this issue. The patch changes the harness to enable running tests/suites using an encoding different from the default system encoding.

------------------------------------
Background:
------------------------------------
To test DERBY-683, I wanted to run an existing test (jdbcapi/lobStreams.java) using an encoding which will have different representation for ASCII characters. I found that UTF-16 uses more number of bits and different representation for ASCII characters than the US-ASCII encoding. Also, UTF-16 is in the list of standard charsets supported by all jvms specified at http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/Charset.html

For this, I first tried using 'jvmflags' property in the test harness. Harness uses this property to launch the jvm process to run tests. So it is possible to do this only when useprocess=true. I could run the test by specifying "jvmflags=-Dfile.encoding=UTF-16" in <test>_app.properties file. However, the test output gets written using UTF-16 encoding and it appears in strange format. To read the output, harness needs to know what encoding the test jvm used. For this, it has to parse the jvmflags, look for "-Dfile.encoding=", get and store the encoding specified, launch the jvm using the encoding, and later use that encoding to read the test output. Since jvmflags property is used by harness for other purposes, I added a new property 'derbyTesting.encoding' to the harness. Harness will read this property to get the encoding and internally change jvmflags before launching the jvm to run tests. As mentioned before, all this is possible only when useprocess=true.

I don't think my changes cover all the things that harness needs to support to run tests with different encodings. So I am trying to list what we can/cannot do with these changes:
* Can specify the encoding in <test>_app.properties file and run an individual test (not as part of a suite). If the encoding property is specified for a test, it gets used only if the test is run individually using RunTest. When the test gets run as part of a suite, useprocess is set to false in RunTest and no new jvm gets launched for the test. So the encoding property won't get used.

* Can specify the encoding in <suite>.properties file and run the whole suite of tests using that encoding. In this case, a new jvm is launched for RunTest class and all tests are run with the encoding specified.

* Cannot successfully run sql tests if the encoding property is specified. The sql files will get read using the specified encoding and may not be meaningful. If the sql files always get read using fixed encoding, this will not be a problem. I think this change is planned in DERBY-658 by Myrna.

* Can run tests/suites using encoding property and look for possible areas for cleanup.

Once the patch is reviewed and if it is okay, I will update the testing readme file with this information.

------------------------------------
Changes:
------------------------------------
This patch does the following:

* Adds a new property 'derbyTesting.encoding' to the test harness. This property can be specified as a suite in <suite>.properties file or as test property in <test>_app.properties file. If this property is set at suite level, it overrides the property set in the tests inside that suite. For this, I made changes to RunSuite.java, RunList.java and RunTest.java to read the property at each level only if it is not set at the higher level.

* The patch uses the value specified in 'derbyTesting.encoding=<enc_value>' for two things:

1. To append to the jvmflags property used to start the child jvm process. Following is appended: -Dfile.encoding=<enc_value>. Currently, I have hard-coded the prefix "-Dfile.encoding=" in the code. In case any new jvm uses a different property name, the prefix can be specified for each jvm in the
corresponding jvm class and this can be retrieved and used.
 
2. To read the test output:
 - when using RunTest, ProcessStreamResult class is used to read the output of the jvm process and write to the output file. The patch changes this class to use InputStreamReader which is created using <enc_value> encoding. The OutputStreamWriter which writes to .out file is created using system default encoding. This needs to use the default encoding because the harness will compare the .out file to master using <default_enc>.
- when using RunSuite, HandleResult class is used to read the test output. The patch changes this class to use InputStreamReader which is created using <enc_value> encoding.

* Creates a new suite encodingTests with jdbcapi/lobStreams.java in .runall file. I created a new suite because I cannot specify the encoding property in a test and run it as part of a suite.

* Adds derbyTesting.encoding=UTF-16 to encodingTests.properties file. Adds excludes for all jvms except Sun jdk1.5.

* Adds the encodingTests suite to derbynetclientmats

With this patch, I ran derbyall with Sun JDK1.4.2 on Windows XP. No failures. With Sun JDK1.5, I ran the new suite encodingTests and verified that the specified encoding is getting used to run the test.
 
Please review this patch. Thanks

Myrna van Lunteren added a comment - 07/Feb/06 09:34 AM
I looked at the deltas and it looks reasonable to me.

I ran the suite on windows 2000 with
   - jdk15 on windows 2000 - which was fine -
   - jdk14 - which was skipped, and failed when I commented that out
   - ibm15 - which should've skipped, but seemed to work, except for plopping 'process exception: null" to the console:
-----------------------
....lse^derbyTesting.encoding=UTF-16^runwithjdk13=false^runwithibm15=false^runwithjdk12=false^hostName=localhost^runwithibm13=false -Dsuitename=encodingTests:encodingTests -Dtopsuitename=encodingTests org.apache.derbyTesting.functionTests.harness.RunTest jdbcapi/lobStreams.java
Process exception: null
Generated report: encodingTests_report.txt
------------------------

It doesn't look like this patch is to blame for the non-skipping of ibm15, I'll have a look at that.

But I am wondering about why you tried to have that skipped, was that because of that Process exception?

Deepa Remesh added a comment - 07/Feb/06 10:15 AM
Thanks Myrna for looking at the patch.

When I ran the repro clob.java with UTF-16 encoding using ibm15, I got NoClassDefFoundError, same as what I get for jdk14. That is the reason I skipped the suite for ibm15. I had not tried running the suite with ibm15 and did not know the skipping was not working. Thanks for finding that.

Myrna van Lunteren added a comment - 08/Feb/06 04:48 AM
Well, retried after ant clobber, and this time, the skipping worked, I must've had a half-baked environment.

Kathey Marsden added a comment - 08/Feb/06 09:04 AM
Do I need to wait to commit this patch for the ibm15 skip to be fixed?

Myrna van Lunteren added a comment - 09/Feb/06 07:50 AM
The ibm15 skip was a non-issue. However, I've found that possibly the encoding setting may not get reset with subsequent tests.

What I found is this:
when commenting out the 'runwithjdk14=false' property, and running derbyall(not when running derbynetclientmats) with insane jars under DOS all subsequent derbynetclientmats tests fail with error like this in the diff:
*** Start: CompatibilityTest jdk1.4.2_03 DerbyNetClient derbynetclientmats:derbynetclientmats 2006-02-07 11:48:42 ***
0 add
> þÿ j a v a . l a n g . N o C l a s s D e f F o u n d E r r o r : org.apache.derbyTesting.functionTests.tests.junitTests.derbyNet.CompatibilityTesÿý
> Exception in thread "main"
Test Failed.
***

So, there's something not entirely ok in this patch.

One solution would be to put the encodingTests as the last suite in derbynetclientmats.properties, but that still leaves an unpleasant feeling.

I suggest we do not commit the test work at this time, and I will look into it in line with DERBY-658.

Deepa Remesh added a comment - 09/Feb/06 08:12 AM
Harness changes are also needed to run tests using different encoding. Please see http://issues.apache.org/jira/browse/DERBY-683

The patch 'derby-683_tests.diff' solves this problem partially but has following problem found by Myrna http://issues.apache.org/jira/browse/DERBY-683#action_12365038.

Deepa Remesh added a comment - 09/Feb/06 08:15 AM
Thanks Myrna for offering to work on the harness changes. I have added a link to this issue in DERBY-658. I will resolve DERBY-683 since the code changes have been committed and test changes will be done as part of DERBY-658.


Deepa Remesh added a comment - 09/Feb/06 08:22 AM
Committed as svn revision 374469 to the trunk. Verified changes by running the attached repro.

Andrew McIntyre added a comment - 17/Feb/06 04:01 AM
Hi Myrna, I can't get this patch (derby-683_021006.diff) to apply cleanly due to changes in RunTest. Could you merge your changes and post an updated patch?

Myrna van Lunteren added a comment - 18/Feb/06 01:40 AM
Attaching an updated patch for the test harness. (DERBY-683_tstpatch03_2006_02_16.*)
This is close to what was created before.
Compared to deepa's original, it
- skips any test run with derbyTesting.encoding if the jvm is not jdk15
- does not set file.encoding in the RunSuite & thus will get set & unset for every test where it applies
- includes a brief mention of the property in the java/testing/READEM.htm

I also took advantage of touching up the README.htm to mention with the remote server functionality that
derby.jar needs to be available (although not in the classpath), and to list the junit tests as a valid test type.

Andrew McIntyre added a comment - 22/Feb/06 04:58 PM
Committed DERBY-683_tstpatch03 to trunk with revision 379723. Followup issue regarding running the encoding tests on JVMs besides Sun JDK 1.5 filed in JIRA as issue DERBY-1027.

Deepa Remesh added a comment - 23/Feb/06 02:39 AM
Thanks Myrna and Andrew for working on this test patch.

I was updating my workspace to see if my test for DERBY-683 runs fine when I noticed that the "new" files in the patch seem to have been added twice. The following files have repeated contents:
Added:
   db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/suites/encodingTests.properties (with props)
   db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/suites/encodingTests.runall

Andrew, can you please check this? Thanks.

Andrew McIntyre added a comment - 23/Feb/06 04:21 AM
Yes, I must have applied the patch twice and patch didn't catch that the new files had already been 'patched'. Good catch, thanks Deepa!

Deepa Remesh added a comment - 11/May/06 06:37 AM
Reopening to port this fix to 10.1 branch

Deepa Remesh added a comment - 16/May/06 10:41 PM
To merge this fix, I have also ported the fix for DERBY-463 which is in the same area. The patch for DERBY-463 is attached to http://issues.apache.org/jira/browse/DERBY-463. This patch has to be applied before running the following merge command:

svn merge -r 374468:374469 https://svn.apache.org/repos/asf/db/derby/code/trunk

With the patch for DERBY-463 and the merge for DERBY-683, I verified that the attached repro passes with UTF-16 file encoding in v10.1. I also ran derbynetclientmats with Sun jdk 1.4.2 on Windows XP. Please take a look at these. Thanks.

Andrew McIntyre added a comment - 18/May/06 05:34 AM
Committed to 10.1 with revision 407391.