XalanJ2
  1. XalanJ2
  2. XALANJ-784

Xalan XHTML output lacks space before the closing />

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3
    • Fix Version/s: None
    • Component/s: Serialization
    • Labels:
      None
    • Environment:
      Operating System: All
      Platform: All

      Description

      W3C recommends that empty XHTML tags using the minimized syntax should be
      created in the form <tag ... /> where the space before the closing /> is
      required to ensure correct interpretation of the tag on legacy browsers. The
      minimised form must be used (so sayeth the W3C) for all tags that were formerly
      used without closing tags in HTML.

      The XHTML output method should generate tags matching this format.
      The minimised form should not be used for elements which have a non-empty
      context model but happen to be empty, like empty paragraph elements <p>.

        Activity

        Hide
        Klaus Johannes Rusch added a comment -

        <xsl:output method="xalan:xhtml" ... /> should work okay.

        Another option would be to trigger XHTML output by setting the version to
        something denoting XHTML, something like

        <xsl:output method="html" version="xhtml1.0" />

        Any preferences?

        Show
        Klaus Johannes Rusch added a comment - <xsl:output method="xalan:xhtml" ... /> should work okay. Another option would be to trigger XHTML output by setting the version to something denoting XHTML, something like <xsl:output method="html" version="xhtml1.0" /> Any preferences?
        Hide
        david_marston added a comment -

        I like the idea of a separate method in xsl:output. That's the direction that
        the XSL Working Group of the W3C is taking for the next version of XSLT.
        Due to the XSLT 1.0 spec requirements, such a method would have to be in Xalan's
        namespace, i.e.,
        <xsl:output method="xalan:xhtml" ... />
        where the "xalan" prefix is mapped to the same URI we use for other features
        and extensions.

        Show
        david_marston added a comment - I like the idea of a separate method in xsl:output. That's the direction that the XSL Working Group of the W3C is taking for the next version of XSLT. Due to the XSLT 1.0 spec requirements, such a method would have to be in Xalan's namespace, i.e., <xsl:output method="xalan:xhtml" ... /> where the "xalan" prefix is mapped to the same URI we use for other features and extensions.
        Hide
        Klaus Johannes Rusch added a comment -

        Given that other than some documentation and a hack no XHTML output has been
        included in Xalan I tend to agree this is an enhancement, and I'd be happy to
        contribute an XHTML serializer.

        Some considerations:

        • Should "xhtml" be a separate method, or triggered by some parameters for
          "html" and "xml" output options such as the doctype

        I tend to prefer a separate output method "xhtml", relying on doctypes
        somewhat limits the options

        • Should SerializeToXHTML be a separate class, or can the XHTML serialization
          get merged into the SerializeToHTML or SerializeToXML classes.

        The separate SerializeToXHTML class is easy to implement, in fact I have created
        on based on SerializeToHTML. Using SerializeToHTML for XHTML serialization
        should not be too hard either.

        Show
        Klaus Johannes Rusch added a comment - Given that other than some documentation and a hack no XHTML output has been included in Xalan I tend to agree this is an enhancement, and I'd be happy to contribute an XHTML serializer. Some considerations: Should "xhtml" be a separate method, or triggered by some parameters for "html" and "xml" output options such as the doctype I tend to prefer a separate output method "xhtml", relying on doctypes somewhat limits the options Should SerializeToXHTML be a separate class, or can the XHTML serialization get merged into the SerializeToHTML or SerializeToXML classes. The separate SerializeToXHTML class is easy to implement, in fact I have created on based on SerializeToHTML. Using SerializeToHTML for XHTML serialization should not be too hard either.
        Hide
        Shane Curcuru added a comment -

        I'd agree with Joseph; this is definitely an enhancement. It seems like some committer earlier on started thinking about how to support some form of XHTML but never finished the real work necessary to do it on the specific elements needed.

        This would be a great place for some new developer to start working on, by proposing and prototyping how we could support XHTML (along with very specific references to exact spec versions, etc. that it would be). The caution would be that plenty of testing would be needed to ensure existing functionality in other areas isn't changed and that serialization performance wasn't degraded.

        One concern would be trying to chase the behavior of specific browser versions - that way seems to lie madness.

        Show
        Shane Curcuru added a comment - I'd agree with Joseph; this is definitely an enhancement. It seems like some committer earlier on started thinking about how to support some form of XHTML but never finished the real work necessary to do it on the specific elements needed. This would be a great place for some new developer to start working on, by proposing and prototyping how we could support XHTML (along with very specific references to exact spec versions, etc. that it would be). The caution would be that plenty of testing would be needed to ensure existing functionality in other areas isn't changed and that serialization performance wasn't degraded. One concern would be trying to chase the behavior of specific browser versions - that way seems to lie madness.
        Hide
        Klaus Johannes Rusch added a comment -

        > Which brings up the question: If we're going have the doctype set this
        behavior
        > automatically, should we have a separate output mode for XHTML?

        I guess a separate output method would still be desirable to support custom
        doctypes which use XHTML serialization conventions.

        The serialization with the W3C XHTML doctype actually works as expected so XHTML
        output has been implemented but is not triggered by the xhtml output method, or
        the occurance of "XHTML" or "xhtml" in the public or system doctypes.

        Show
        Klaus Johannes Rusch added a comment - > Which brings up the question: If we're going have the doctype set this behavior > automatically, should we have a separate output mode for XHTML? I guess a separate output method would still be desirable to support custom doctypes which use XHTML serialization conventions. The serialization with the W3C XHTML doctype actually works as expected so XHTML output has been implemented but is not triggered by the xhtml output method, or the occurance of "XHTML" or "xhtml" in the public or system doctypes.
        Hide
        Joe Kesselman added a comment -

        Actually... I see XHTML mentioned in Method.java), but I see no code which
        actually uses that constant. So I'd say offhand that the XHTML output method was
        never implemented, and the public-ID-triggered feature is all we've got right
        now.

        It looks like it wouldn't be too hard to add. But since it isn't there at all
        and there isn't any evidence it ever was, I would still call adding this
        behavior an extension request rather than a bug report. Especially if the public
        ID hook provides a working alternative solution. (I haven't tested that
        approach.)

        Show
        Joe Kesselman added a comment - Actually... I see XHTML mentioned in Method.java), but I see no code which actually uses that constant. So I'd say offhand that the XHTML output method was never implemented, and the public-ID-triggered feature is all we've got right now. It looks like it wouldn't be too hard to add. But since it isn't there at all and there isn't any evidence it ever was, I would still call adding this behavior an extension request rather than a bug report. Especially if the public ID hook provides a working alternative solution. (I haven't tested that approach.)
        Hide
        Joe Kesselman added a comment -

        >While XSLT 1.0 does not specify an XHTML output method, Xalan does support it

        ... You're right, I was confused.

        That being the case, I agree that an explicit XHTML output method should use the
        " />" formatting. And in faxt, SerializerToXML.java has such a hook, triggered
        by
        if (m_doctypePublic.startsWith("-//W3C//DTD XHTML"))

        It looks like what's needed is to have the explicit XHTML output method force
        that same flag to true. Meanwhile, it looks like you ought to be able to get the
        output you want by making sure your xsl:output asserts an appropriate public
        doctype string.

        Which brings up the question: If we're going have the doctype set this behavior
        automatically, should we have a separate output mode for XHTML?

        Show
        Joe Kesselman added a comment - >While XSLT 1.0 does not specify an XHTML output method, Xalan does support it ... You're right, I was confused. That being the case, I agree that an explicit XHTML output method should use the " />" formatting. And in faxt, SerializerToXML.java has such a hook, triggered by if (m_doctypePublic.startsWith("-//W3C//DTD XHTML")) It looks like what's needed is to have the explicit XHTML output method force that same flag to true. Meanwhile, it looks like you ought to be able to get the output you want by making sure your xsl:output asserts an appropriate public doctype string. Which brings up the question: If we're going have the doctype set this behavior automatically, should we have a separate output mode for XHTML?
        Hide
        Klaus Johannes Rusch added a comment -

        While XSLT 1.0 does not specify an XHTML output method, Xalan does support it
        (and XSLT processors are allowed by the XSLT 1.0 specs to support additional
        output methods).

        Post-processing the output in sed, or piping it through jtidy, would be trivial
        indeed, however not in an environment where the output from Xalan is served to
        the client directly.

        It's not as simple as adding a space before the />, while <br></br>, <br/> and
        <br /> are all valid and equivalent in XML terms, only one of these works with
        browsers currently in use still, <br />. On the other hand, <p /> would be a bad
        idea because it crashes browsers (again, valid XML and even valid XHTML but
        still not usable in real-world applications( and needs to be serialized as
        <p></p>.

        Show
        Klaus Johannes Rusch added a comment - While XSLT 1.0 does not specify an XHTML output method, Xalan does support it (and XSLT processors are allowed by the XSLT 1.0 specs to support additional output methods). Post-processing the output in sed, or piping it through jtidy, would be trivial indeed, however not in an environment where the output from Xalan is served to the client directly. It's not as simple as adding a space before the />, while <br></br>, <br/> and <br /> are all valid and equivalent in XML terms, only one of these works with browsers currently in use still, <br />. On the other hand, <p /> would be a bad idea because it crashes browsers (again, valid XML and even valid XHTML but still not usable in real-world applications( and needs to be serialized as <p></p>.
        Hide
        Joe Kesselman added a comment -

        XSLT 1.0 has no official "XHTML output method". You have your choice of HTML
        output mode (which doesn't generate the problematic syntax at all) XML output
        mode (where a space before the /> would be meaningless) or Text output mode.

        It might be a Very Good Thing if XSLT 2.0 added explicit XHTML support,
        formatted in the way you suggest. But that's somethig to take up with the W3C's
        XSL Working Group.

        Meanwhile, we could prototype explicit XHTML support. Worth considering. But
        that gets you into the situation of writing nonportable stylesheets (if they
        explicitly ask for XHTML as their output mode) or non-interoperable stylesheets
        (if we switch into that mode implicitly or via a Processing Instruction; same
        stylesheet elsewhere will produce different results).

        Alternatively... You know, it really would be pretty darned trivial to use SED
        or another text-processing tool as a postprocessor to globally replace '/>' with
        ' />'. Do you really need Xalan to do this for you?

        Definitely an interesting idea and worth considering as a possible Enhancement,
        but I can't justify calling it a bug.

        Show
        Joe Kesselman added a comment - XSLT 1.0 has no official "XHTML output method". You have your choice of HTML output mode (which doesn't generate the problematic syntax at all) XML output mode (where a space before the /> would be meaningless) or Text output mode. It might be a Very Good Thing if XSLT 2.0 added explicit XHTML support, formatted in the way you suggest. But that's somethig to take up with the W3C's XSL Working Group. Meanwhile, we could prototype explicit XHTML support. Worth considering. But that gets you into the situation of writing nonportable stylesheets (if they explicitly ask for XHTML as their output mode) or non-interoperable stylesheets (if we switch into that mode implicitly or via a Processing Instruction; same stylesheet elsewhere will produce different results). Alternatively... You know, it really would be pretty darned trivial to use SED or another text-processing tool as a postprocessor to globally replace '/>' with ' />'. Do you really need Xalan to do this for you? Definitely an interesting idea and worth considering as a possible Enhancement, but I can't justify calling it a bug.

          People

          • Assignee:
            Unassigned
            Reporter:
            Klaus Johannes Rusch
          • Votes:
            3 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development