Uploaded image for project: 'Wicket'
  1. Wicket
  2. WICKET-5416

BOM in UTF markup file breaks encoding detection

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.10, 6.12.0
    • 1.5.11, 6.13.0, 7.0.0-M1
    • wicket
    • None
    • Windows 7, jdk1.7.0_45

    Description

      I have project with internationalization and experienced this problem with one of the pages with non-english content. Page had UTF-8 encoding, but my JVM encoding is different. I always use "<?xml encoding ... ?>" to specify encoding for markup pages (and "MarkupSettings.defaultMarkupEncoding" is not set).

      Unexpectedly I got problem with bad encoding on page. After several hours of debugging I found what source of this issue was UTF BOM (Byte order mark) at the beggining of file and inability of "XmlReader" to process it. "XmlReader.getXmlDeclaration" tries to match xml declaration with regular expression, but fails because of BOM. After that encoding defaults to JVM encoding.

      It's possible to use "org.apache.commons.io.input.BOMInputStream" to handle BOM or you could handle it manually inside "XmlReader".

      PS: issue found with Wicket 1.5.10 and I see same code in 6.12.0 without BOM handling, so I added it to "Affects Version/s", but no proof-in-code available from me at this moment.

      Attachments

        Activity

          People

            mgrigorov Martin Tzvetanov Grigorov
            Zebr911-v Vadim Ponomarev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: