Issue Details (XML | Word | Printable)

Key: XALANJ-2271
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Brian Minchau
Reporter: Brian Minchau
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
XalanJ2

XML 1.1 Serialization, char in attribute value not escaped

Created: 22/Feb/06 01:38 AM   Updated: 12/Dec/07 03:26 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 2.7.1

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works character.expansion.patch1.txt 2006-02-25 02:44 AM Brian Minchau 37 kB
Text File Licensed for inclusion in ASF works character.expansion.patch3.txt 2006-03-04 05:16 AM Brian Minchau 41 kB

Xalan info: PatchAvailable
Reviewer: Henry Zongaro
Resolution Date: 08/Mar/06 04:18 AM


 Description  « Hide
This issue was found by Henry Zongaro.

If you try the following stylesheet, you'll see that the character x8C, which is not permitted in literal form in XML 1.1, is escaped when it appears in an element's character content, but it's not escaped when it is part of an attribute value.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="xml" version="1.1"/>
  <xsl:template match="/">
    <out att="&#x8c;">&#x8c;</out>
  </xsl:template>
</xsl:stylesheet>

When the serialized XML produced by this stylesheet is parsed by Xerces (depending perhaps on the version of Xerces) it goes into an infinite loop when it attempts to parse an attribute that contains an invalid character.


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Brian Minchau added a comment - 25/Feb/06 03:35 AM
This issue was more difficult than I thought. The character expansion code in the serializer has been getting better over time but is still complicated.

The CharInfo changes do the following:
1. Previously the CharInfo object for HTML,TEXT and XML were all cached in a static Hashtable. Seems good for performance, but the downside of this was that the CharInfo's getOutputStringForChar(char) method, that returned the entity for a given char was synchronized (e.g. map '<' to "&lt;"). When generating HTML, which has lots of entities coming from the HTMLEntities.properties file, in a webserver this can be a bottleneck on a busy server.

The changes were to make each CharInfo object returned to the caller a mutable copy and not require synchronization any more.
Some Hashtables were changed to HashMap for performance.

Previously this isSpecialAttrChar() said that a lot of other characters were special, but now it is related only to entities.
Changes to isSpecialAttrChar() and isSpecialTextChar(). Basically these routines return true if there is an entity for character. However there is some internal tweaking to:
> output a literal tab as "&#9;" in XML attribute values
> output a quote in an XML attribute as "&#34;"
> leave a literal quote as-is in HTML or XML text nodes
> output less than sign as-is in HTML attribute values


2. Changes to ToStream method characters(final char chars[], final int start, final int length) is reworked in an effient way to cover characters in the C0 and C1 range to be written out as character references (except for tab, newline, carriage return). Also the line-separator 0x2028 will be written out as a character reference. This processing is done regardless of the XML version (1.0 or 1.1) but is good for XML 1.0 also, just in case it is is included as a generally parsed entity in an XML 1.1 file.

3. Changes to ToStream method writeAttrString()

4. Minor changes to ToXMLStream and ToHTMLStream to make the CharInfo object used to check for entities non-static, but one owned by that serializer, which drops the need for synchronization when looking up entities.

Brian Minchau added a comment - 25/Feb/06 04:06 AM
Assigning Henry Z. to review.

Brian Minchau added a comment - 04/Mar/06 05:16 AM
Attaching character.expansion.patch3.txt which is a rework due to comments by Henry Z.

Henry Zongaro added a comment - 07/Mar/06 05:51 AM
I have reviewed Brian's patch3[1], and I believe that it correctly resolves the problem.

[1] http://issues.apache.org/jira/secure/attachment/12323688/character.expansion.patch3.txt

Brian Minchau added a comment - 08/Mar/06 04:18 AM
Fixed.

Brian Minchau added a comment - 11/Dec/07 04:57 PM
Would the originator of this issue please verify that this issue is fixed in the 2.7.1 release, by adding a comment to this issue, so that we can close this issue.

A lack of response by February 1, 2008 will be taken as consent that we can close this resolved issue.

Regards,
Brian Minchau

Brian Minchau added a comment - 12/Dec/07 03:26 AM
closing this issue.