Issue Details (XML | Word | Printable)

Key: XALANJ-2184
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Critical Critical
Assignee: Brian Minchau
Reporter: Pedro Alves
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
XalanJ2

cp850, cp860 serialization fails

Created: 06/Aug/05 08:12 AM   Updated: 11/Dec/07 04:57 PM
Return to search
Component/s: Serialization
Affects Version/s: 2.7
Fix Version/s: 2.7.1

Time Tracking:
Not Specified

File Attachments:
  Size
File Licensed for inclusion in ASF works cp-850-encoding.tgz 2005-08-06 08:13 AM Pedro Alves 4 kB
Text File Licensed for inclusion in ASF works patch.txt 2005-09-13 07:12 AM Brian Minchau 0.6 kB
Zip Archive Licensed for inclusion in ASF works testcase-kaapa-20050804.zip 2005-08-06 08:13 AM Pedro Alves 2 kB
Environment: Windows and Linux, jdk1.5

Xalan info: PatchAvailable
Resolution Date: 16/Sep/05 02:57 AM


 Description  « Hide
Xalan fails serialization for some encodings. Tested with cp850 and cp860. Testcases are included

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Brian Minchau added a comment - 10/Aug/05 12:53 PM
The value of the encoding attribute of an <xsl:output> element should be
an IANA or MIME name. If one looks at the IANA standard at
http://www.iana.org/assignments/character-sets, one finds
information on various encodings, and for each encoding
all of the equivalent aliases for it. For example:

Name: IBM278 [RFC1345,KXS2]
MIBenum: 2034
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP278
Alias: ebcdic-cp-fi
Alias: ebcdic-cp-se
Alias: csIBM278

One should be able to use any of the aliases of a given encoding.
So all of these xsl:output elements should result in the same encoding being used.
<xsl:output encoding="CP278" />
<xsl:output encoding="ebcdic-cp-fi" />
<xsl:output encoding="ebcdic-cp-se" />
<xsl:output encoding="csIBM278" />



 However Xalan-J is written in Java, so ultimately such names
must be mapped to a corresponding name to be used by the Java runtime.

**************
**** THE INFORMATION IN THE Serializer.properties FILE
**** AND ITS FORMAT IS NOT A PUBLIC API.
**************

The cooresponding line in the Serlializer.properties file for the same encoding is this:
  Cp278 EBCDIC-CP-FI,EBCDIC-CP-SE 0x00FF

Cp278 is the particular IANA alias recognized by the Java runtime for the encoding.
The comma separated list are the other IANA aliases. Our implementation will
first map any of the other aliases to the first one, and present the Java runtime
only with the first name.

Lastly on the line is 0x00FF that is supposed to indicate the largest unicode value in the encoding,
but this field is no longer used in Xalan-J 2.7

So we see a bug in the corresponding Serializer.properties file for this encoding,
csIBM278 is missing from the alias list.

------------------------------------------------------------------
On the IANA web page there is this information about cp850 and cp860

Name: IBM850 [RFC1345,KXS2]
MIBenum: 2009
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp850
Alias: 850
Alias: csPC850Multilingual

Name: IBM860 [RFC1345,KXS2]
MIBenum: 2048
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp860
Alias: 860
Alias: csIBM860
 
There are no entries for Cp850 and Cp860 in the Serializer.properties file,
but if there were then they should look like this:
Cp850 850,csPC850Multilingual 0XFFFF
Cp860 860,csIBM860 0xFFFF

The code in the serializer is written with the intent that if no entry appears in the Encodings.properties file,
then the encoding name is used as-is, as a Java name, so Cp850 ought to work, but the serializer gives an error message that
the encoding is not supported.

Out of curiosity I added the suggested lines to Serializer.properties and suddenly the encodings were recognized.

So there seem to be a few things (bugs) to change here:

1) Cp850 and Cp860 should be recognized as they are, with no changes to Serializer.properties
because they are IANA names that happen to also be the name that should be recognized by by the Java runtime.

2) The entries in Serializer.properties need to be updated with the information from IANA, whole encodings are missing, and some encodings
are missing aliases.

3) a little clean up might be needed, we can drop the value of the code point of the largest unicode value (0xFFFF sort of stuff) from each entry.


Brian Minchau added a comment - 13/Sep/05 07:12 AM
Attaching a patch to Encodings.properties to add the encodings Cp850, Cp860.

There are actually many many missing encodings in this properties file. The design needs to be revisited.

Sarah McNamara added a comment - 15/Sep/05 11:44 PM
I have reviewed the patch and I approve.

Brian Minchau added a comment - 16/Sep/05 02:57 AM
Patch was applied to CVS HEAD branch. Resolving as fixed in "Latest Development Code"

Brian Minchau added a comment - 11/Dec/07 04:57 PM
Would the originator of this issue please verify that this issue is fixed in the 2.7.1 release, by adding a comment to this issue, so that we can close this issue.

A lack of response by February 1, 2008 will be taken as consent that we can close this resolved issue.

Regards,
Brian Minchau