Uploaded image for project: 'Wicket'
  1. Wicket
  2. WICKET-5416

BOM in UTF markup file breaks encoding detection

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.10, 6.12.0
    • Fix Version/s: 1.5.11, 6.13.0, 7.0.0-M1
    • Component/s: wicket
    • Labels:
      None
    • Environment:
      Windows 7, jdk1.7.0_45

      Description

      I have project with internationalization and experienced this problem with one of the pages with non-english content. Page had UTF-8 encoding, but my JVM encoding is different. I always use "<?xml encoding ... ?>" to specify encoding for markup pages (and "MarkupSettings.defaultMarkupEncoding" is not set).

      Unexpectedly I got problem with bad encoding on page. After several hours of debugging I found what source of this issue was UTF BOM (Byte order mark) at the beggining of file and inability of "XmlReader" to process it. "XmlReader.getXmlDeclaration" tries to match xml declaration with regular expression, but fails because of BOM. After that encoding defaults to JVM encoding.

      It's possible to use "org.apache.commons.io.input.BOMInputStream" to handle BOM or you could handle it manually inside "XmlReader".

      PS: issue found with Wicket 1.5.10 and I see same code in 6.12.0 without BOM handling, so I added it to "Affects Version/s", but no proof-in-code available from me at this moment.

        Attachments

          Activity

            People

            • Assignee:
              mgrigorov Martin Grigorov
              Reporter:
              Zebr911-v Vadim Ponomarev
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: