Issue Details (XML | Word | Printable)

Key: XALANJ-1497
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Brian Minchau
Reporter: dcaveney
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
XalanJ2

xsl:copy adds a newline to processing instructions

Created: 25/Apr/03 11:01 AM   Updated: 11/Dec/07 04:57 PM
Return to search
Component/s: transformation, Xalan-interpretive
Affects Version/s: 2.5
Fix Version/s: 2.7.1

Time Tracking:
Not Specified

File Attachments:
  Size
XML File d19306.xml 2003-04-25 04:37 PM Henry Zongaro 0.1 kB
Text File d19306.xsl 2003-04-25 04:37 PM Henry Zongaro 0.2 kB
Text File Licensed for inclusion in ASF works patch.txt 2006-03-09 03:12 AM Brian Minchau 4 kB
XML File Licensed for inclusion in ASF works xalanj-1497.out 2006-03-10 02:33 AM Brian Minchau 0.3 kB
XML File Licensed for inclusion in ASF works xalanj-1497.xsl 2006-03-10 02:32 AM Brian Minchau 0.8 kB
Environment:
Operating System: Other
Platform: Other
Issue Links:
Duplicate
 

Bugzilla Id: 19306
Xalan info: PatchAvailable
Reviewer: Henry Zongaro
Resolution Date: 10/Mar/06 02:12 PM


 Description  « Hide
<xsl:copy> produces a node AND a linefeed for processing-instruction nodes in
the "root" of the document.

Example:

The following template faithfully reproduces an XML document:

<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*"/>
</xsl:copy>
</xsl:template>

...EXCEPT when the document contains processing instruction nodes in the
"root" (ie. before the "document" element). There doesn't seem to be a problem
for nodes that are descendants of the "document" element.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Brian Minchau added a comment - 25/Apr/03 12:15 PM
dcaveney,
would you please add a simple xml input document that has the problem you
describe to this bug report, and its corresponding output.

Thanks,
Brian Minchau

Henry Zongaro added a comment - 25/Apr/03 04:35 PM
I can reproduce this with Xalan-J 2.5. The following comment appears in the
serializer code that handles processing instructions
(ToXMLStream.processingInstruction):

   // Always output a newline char if not inside of an
   // element. The whitespace is not significant in that
   // case.

That is a true statement when the output is considered to be an XML document
entity, but if the output is used as an external entity referenced from within
an XML document, that extra whitespace could be significant.

A variant of this issue came up in some e-mail exchanged between David Marston
and Michael Kay. The Xalan-J processors always emit an EOL marker after the
XML/Text declaration at the start of the output. That's not significant when
the output is treated as an XML document, but it could be significant if the
output is treated as an external entity referenced by an XML document.

Henry Zongaro added a comment - 25/Apr/03 04:37 PM
Created an attachment (id=6005)
Input XML document

Henry Zongaro added a comment - 25/Apr/03 04:37 PM
Created an attachment (id=6006)
Example stylesheet

Brian Minchau added a comment - 25/Apr/03 07:38 PM

I don't think we can ever figure out if the current document being serialized
is ever going to be used as an external entity referenced by another XML
document in the future.

Of the three stream serializers, ToXMLStream, ToHTMLStream and ToTextStream,
the first two will append and end-of-line (EOL) after a processing instruction
if the processing instruction happens outside of any elements. Is the solution
then to never output an EOL, neither when inside of an element, nor outside of
any elements?

That would be an easy fix, just delete the few lines that output the EOL in the
two processingInstruction(target,data) methods. Is that the right thing?

Henry Zongaro added a comment - 25/Apr/03 07:56 PM
Hi, Brian. I believe the only time you can tell that the result really is a
document entity is if it contains a DTD, but you're right that in most cases
you can't tell. So, yes, the fix would be to remove the EOL emitted in the
processingInstruction methods, and also after the XML declaration.

Interestingly, comments don't have the same problem.

david_marston added a comment - 25/Apr/03 10:02 PM
Testing note: the data for tests copy52 and copy53 could easily be expanded to
have PIs outside the document element. The gold files would have to change to
match, so the gold file should reflect the lack-of-newline if that's the
intended gold standard.

Henry Zongaro added a comment - 26/Apr/03 12:47 AM
Might want to extend output59 and output72 as well. They test xsl:processing-
instruction. The former actually tests a PI outside the document element,
though only for HTML.

Brian Minchau added a comment - 08/Mar/06 04:01 AM
JIRA Triage meeting Tuesday March 7, 2006 - agreed to modify the behavior to NEVER output a newline after a PI.
Assigned to Brian M.

Brian Minchau added a comment - 09/Mar/06 12:36 AM
For XML produced from a transformation one never knows how it will be used.
Suppose that one produces this:
<?xml version='1.0' encoding='UTF-8'?><?PI-one?> <!-- comment one --> <elem1><elem2>hello</elem2></elem1>

Suppose that one serialized with indent='yes'. Where could whitespace (e.g. newlines and spaces for indentation) be inserted?
In this case only between <elem1> and <elem2>, or between </elem2> and </elem1>.

Adding whitespace before or after a top level PI, comment or whitespace text node may not be correct because this XML could be used as an external general parsed entity. For example suppose that a it was refered to as &egpe; and included in other XML like this:

<e>some text&egpe;more text</e>

In this case (and so in general) it is not correct to add whitespace to the top level of serialized XML as that whitespace will occur after "some text" or perhaps before "more text". The producer of the serialized XML can not know the context in which that XML will later be used.

So even with indentation='yes' we should not put additional whitespace between top level nodes, not even a newline after an XML header!
One can only add whitespace for indentation within an element. In the above example indentation, if any, would be within the <elem1> and </elem1> tags, so indentation could look like this:

<?xml version='1.0' encoding='UTF-8'?><?PI-one?> <!-- comment one --> <elem1>
  <elem2>hello</elem2>
</elem1>



David Bertoni added a comment - 09/Mar/06 12:43 AM
The XSLT recommendation allows modification of the result tree's content when indenting is enabled:

http://www.w3.org/TR/xslt#strip

"The xml output method should use an algorithm to output additional whitespace that ensures that the result if whitespace were to be stripped from the output using the process described in [3.4 Whitespace Stripping] with the set of whitespace-preserving elements consisting of just xsl:text would be the same when additional whitespace is output as when additional whitespace is not output.

    NOTE:It is usually not safe to use indent="yes" with document types that include element types with mixed content."

Brian Minchau added a comment - 09/Mar/06 03:12 AM
Attaching a patch to change the ToStream.shouldIndent(). It previously returned true only if the indent='yes' was specified, plus other contitions. Yet one more condition was added, that we must be inside of an element, not as a top level node in an XML document or fragment. This has no performance impact when indent='no' (which is the default for XML).

This will have a minor performance impact for some HTML with indent='yes', which is the default, but heck, if you want it done right it usually costs something. Even for HTML if the last thing written out was text, there is no performance impact.

A newline is no longer written out immediately after a PI since we don't know if non-whitespace text will follow the PI.

Previously a newline was written out after the XML header if indent='yes'. A newline is now only written out after the header when indent='yes' and one of these:
> standalone was specified (either yes or no)
> A DOCTYPE will be written out.>


Brian Minchau added a comment - 10/Mar/06 02:01 AM
Attaching a testcase, xalanj-1497.xsl that puts comments, processing instrucitons and text before, in the document-element and after that element.

It also sets indentation to 'yes' and the indentation amount (a xalan specific xsl:output attribute) to '3'.

With the fix there is no indentation before or after the output document element.

Brian Minchau added a comment - 10/Mar/06 02:03 AM
Attaching xalanj-1497.out the gold file for what should be output.

Brian Minchau added a comment - 10/Mar/06 02:32 AM
Attaching testcase xalanj-1497.xsl

Brian Minchau added a comment - 10/Mar/06 02:33 AM
Attaching xalanj-1497.out ... the gold file.

Brian Minchau added a comment - 10/Mar/06 04:49 AM
Attaching patch2.txt which is a slight rework of patch.txt. Henry Zongaro found a bug during the review and this patch has that fix.

Henry found that a stylesheet like this:
<xsl:output method="html" doctype-system='abc' />
<xsl:template match="/">
  <xsl:comment>abc</xsl:comment>
  <html/>
</xsl:template>

put out two DOCTYPE declarations due to a latent bug in the comment() method, which didn't do the usual cleanup of pending issues, such
as closing opening start element tags, or handling what to do if no startDocument() call was received (other methods have such code).

Brian Minchau added a comment - 10/Mar/06 04:51 AM
Ignore my last comment... it was meant for xalanj-2276

Henry Zongaro added a comment - 10/Mar/06 11:55 AM
I have reviewed and approve Brian's patch.[1]

[1] http://issues.apache.org/jira/secure/attachment/12323946/patch.txt

Brian Minchau added a comment - 10/Mar/06 02:12 PM
Fixed. The patch was applied to the latest development code.

Brian Minchau added a comment - 11/Dec/07 04:57 PM
Would the originator of this issue please verify that this issue is fixed in the 2.7.1 release, by adding a comment to this issue, so that we can close this issue.

A lack of response by February 1, 2008 will be taken as consent that we can close this resolved issue.

Regards,
Brian Minchau