Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-1663

IconvGNU and IconvFBSD based transcoders assume UCS-2 as XMLCh encoding

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 2.7.0
    • 3.0.0
    • Utilities
    • None
    • any

    Description

      I was studying the code in IconvGNU and IconvFBSD transcoders and it appears that they assume UCS-2 is the encoding for XMLCh when it's actually UTF-16. I believe this can result in the loss of data.

      The encoding that is used for XMLCh is stored in the fUnicodeCP variable which is initialized in the Iconv

      {GNU,FBSD}TransServices c-tor. The initialization code just tries all encodings from the gIconv{GNU,FBSD}

      Encodings array which for GNU contains only UCS-2 and for FreeBSD contains UCS-2 and UCS-4 encodings.

      I tried to add a UTF-16LE to this array (as a first item) and it works fine for GNU (I double checked that UTF-16LE ends up in fUnicodeCP). Does anybody knows what's going on here? Should we add UTF-16 to these arrays?

      Attachments

        Activity

          People

            amassari Alberto Massari
            bsk Boris Kolpackov
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: