|
Attaching a patch to Encodings.properties to add the encodings Cp850, Cp860.
There are actually many many missing encodings in this properties file. The design needs to be revisited. I have reviewed the patch and I approve.
Patch was applied to CVS HEAD branch. Resolving as fixed in "Latest Development Code"
Would the originator of this issue please verify that this issue is fixed in the 2.7.1 release, by adding a comment to this issue, so that we can close this issue.
A lack of response by February 1, 2008 will be taken as consent that we can close this resolved issue. Regards, Brian Minchau |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
an IANA or MIME name. If one looks at the IANA standard at
http://www.iana.org/assignments/character-sets, one finds
information on various encodings, and for each encoding
all of the equivalent aliases for it. For example:
Name: IBM278 [RFC1345,KXS2]
MIBenum: 2034
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP278
Alias: ebcdic-cp-fi
Alias: ebcdic-cp-se
Alias: csIBM278
One should be able to use any of the aliases of a given encoding.
So all of these xsl:output elements should result in the same encoding being used.
<xsl:output encoding="CP278" />
<xsl:output encoding="ebcdic-cp-fi" />
<xsl:output encoding="ebcdic-cp-se" />
<xsl:output encoding="csIBM278" />
However Xalan-J is written in Java, so ultimately such names
must be mapped to a corresponding name to be used by the Java runtime.
**************
**** THE INFORMATION IN THE Serializer.properties FILE
**** AND ITS FORMAT IS NOT A PUBLIC API.
**************
The cooresponding line in the Serlializer.properties file for the same encoding is this:
Cp278 EBCDIC-CP-FI,EBCDIC-CP-SE 0x00FF
Cp278 is the particular IANA alias recognized by the Java runtime for the encoding.
The comma separated list are the other IANA aliases. Our implementation will
first map any of the other aliases to the first one, and present the Java runtime
only with the first name.
Lastly on the line is 0x00FF that is supposed to indicate the largest unicode value in the encoding,
but this field is no longer used in Xalan-J 2.7
So we see a bug in the corresponding Serializer.properties file for this encoding,
csIBM278 is missing from the alias list.
------------------------------------------------------------------
On the IANA web page there is this information about cp850 and cp860
Name: IBM850 [RFC1345,KXS2]
MIBenum: 2009
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp850
Alias: 850
Alias: csPC850Multilingual
Name: IBM860 [RFC1345,KXS2]
MIBenum: 2048
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp860
Alias: 860
Alias: csIBM860
There are no entries for Cp850 and Cp860 in the Serializer.properties file,
but if there were then they should look like this:
Cp850 850,csPC850Multilingual 0XFFFF
Cp860 860,csIBM860 0xFFFF
The code in the serializer is written with the intent that if no entry appears in the Encodings.properties file,
then the encoding name is used as-is, as a Java name, so Cp850 ought to work, but the serializer gives an error message that
the encoding is not supported.
Out of curiosity I added the suggested lines to Serializer.properties and suddenly the encodings were recognized.
So there seem to be a few things (bugs) to change here:
1) Cp850 and Cp860 should be recognized as they are, with no changes to Serializer.properties
because they are IANA names that happen to also be the name that should be recognized by by the Java runtime.
2) The entries in Serializer.properties need to be updated with the information from IANA, whole encodings are missing, and some encodings
are missing aliases.
3) a little clean up might be needed, we can drop the value of the code point of the largest unicode value (0xFFFF sort of stuff) from each entry.