1. Wicket
  2. WICKET-5416

BOM in UTF markup file breaks encoding detection


    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.10, 6.12.0
    • Fix Version/s: 1.5.11, 6.13.0, 7.0.0-M1
    • Component/s: wicket
    • Labels:
    • Environment:
      Windows 7, jdk1.7.0_45


      I have project with internationalization and experienced this problem with one of the pages with non-english content. Page had UTF-8 encoding, but my JVM encoding is different. I always use "<?xml encoding ... ?>" to specify encoding for markup pages (and "MarkupSettings.defaultMarkupEncoding" is not set).

      Unexpectedly I got problem with bad encoding on page. After several hours of debugging I found what source of this issue was UTF BOM (Byte order mark) at the beggining of file and inability of "XmlReader" to process it. "XmlReader.getXmlDeclaration" tries to match xml declaration with regular expression, but fails because of BOM. After that encoding defaults to JVM encoding.

      It's possible to use "org.apache.commons.io.input.BOMInputStream" to handle BOM or you could handle it manually inside "XmlReader".

      PS: issue found with Wicket 1.5.10 and I see same code in 6.12.0 without BOM handling, so I added it to "Affects Version/s", but no proof-in-code available from me at this moment.


        Martin Grigorov made changes -
        Field Original Value New Value
        Resolution Fixed [ 1 ]
        Fix Version/s 7.0.0 [ 12322958 ]
        Fix Version/s 6.13.0 [ 12325564 ]
        Fix Version/s 1.5.11 [ 12324069 ]
        Assignee Martin Grigorov [ mgrigorov ]
        Status Open [ 1 ] Resolved [ 5 ]
        Vadim Ponomarev created issue -


          • Assignee:
            Martin Grigorov
            Vadim Ponomarev
          • Votes:
            0 Vote for this issue
            3 Start watching this issue


            • Created: