Cocoon
  1. Cocoon
  2. COCOON-2249

XHTMLSerializer uses entity references " and ' which cause JavaScript parse errors

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.12, 2.2.1
    • Fix Version/s: None
    • Component/s: Blocks: Serializers
    • Labels:
      None
    • Other Info:
      Patch available

      Description

      The XHTMLSerializer, or, more specifically, the XHMLEncoder, from the serializers block in Cocoon 2.1.x escapes all characters with a corresponding HTML 4.0 character entity reference into this entity reference. This causes issues with inline JavaScript, since e.g. the double quotes are transformed to " which causes a JavaScript parsing error. Another minor negative effect is the increased document size.

      If I understand the W3C correctly, see e.g. [2], the recommended approach is to use the character set of the encoding as far as possible,
      and use escapes only in exceptional circumstances. I didn't find a reason why the XHTMLSerializer uses escapes, but I suspect that it is related to browser compatibility issues.

      Maybe we could make this behaviour configurable, e.g.

        <use-entity-references>true|false</use-entity-references>

      [1] http://www.nabble.com/Problem-with-XHTMLSerializers-to1311360.html#a1311360
      [2] http://www.w3.org/International/tutorials/tutorial-char-enc/
      1. cocoon-serializers.txt
        4 kB
        Andreas Hartmann
      2. MinimalXMLEncoder.java
        4 kB
        Andreas Hartmann
      3. COCOON-2249-2009-01-21-1601.txt
        8 kB
        Andreas Hartmann

        Activity

        Hide
        Andreas Hartmann added a comment -
        Does this problem also occur in Cocoon 3? Are further actions required?
        Show
        Andreas Hartmann added a comment - Does this problem also occur in Cocoon 3? Are further actions required?
        Hide
        Andreas Hartmann added a comment -
        I applied the patch to the 2.2 branch as well.
        Show
        Andreas Hartmann added a comment - I applied the patch to the 2.2 branch as well.
        Hide
        Andreas Hartmann added a comment -
        The patch I committed in the 2.1.x branch. Has to be applied to the 2.2 branch as well.
        Show
        Andreas Hartmann added a comment - The patch I committed in the 2.1.x branch. Has to be applied to the 2.2 branch as well.
        Hide
        Andreas Hartmann added a comment -
        As Antonio suggested, I pulled up the behaviour of the HTMLSerializer to the XHTMLSerializer. Should be fixed in the 2.1.x branch in revision 736988.
        Show
        Andreas Hartmann added a comment - As Antonio suggested, I pulled up the behaviour of the HTMLSerializer to the XHTMLSerializer. Should be fixed in the 2.1.x branch in revision 736988.
        Hide
        Andreas Hartmann added a comment -
        This is the MinimalXMLEncoder that was missing in the last patch.
        Show
        Andreas Hartmann added a comment - This is the MinimalXMLEncoder that was missing in the last patch.
        Hide
        Antonio Gallardo added a comment -
        In the patch MinimalXMLEncoder.java is missing. Would you post it?
        Show
        Antonio Gallardo added a comment - In the patch MinimalXMLEncoder.java is missing. Would you post it?
        Hide
        Andreas Hartmann added a comment -
        The patch enables subclasses of the EncodingSerializer to determine the encoder type after the constructor has been called.

        To resolve the escaping issue, a MinimalXMLEncoder has been introduced, which escapes only the characters < > &, as recommended by the W3C for XHTML documents [1].

        [1] http://www.w3.org/International/tutorials/tutorial-char-enc/#Slide0440
        Show
        Andreas Hartmann added a comment - The patch enables subclasses of the EncodingSerializer to determine the encoder type after the constructor has been called. To resolve the escaping issue, a MinimalXMLEncoder has been introduced, which escapes only the characters < > &, as recommended by the W3C for XHTML documents [1]. [1] http://www.w3.org/International/tutorials/tutorial-char-enc/#Slide0440

          People

          • Assignee:
            Unassigned
            Reporter:
            Andreas Hartmann
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development