Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-1092

Win32Transcoder does not properly transcode ISO-8859-2 and other encodings

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 2.5.0
    • Utilities
    • None
    • Operating System: Windows XP
      Platform: PC
    • 25498

    Description

      Win32TransService scans the Windows registry for supported charsets and reads
      the "Codepage" and "InternetEncoding". For many charsets these value are equal,
      but not for all.

      When a Win32Transcoder object is created for a given charset, the "Codepage"
      value is stored in the fWinCP member and the "InternetEncoding" value in the
      fIECP member. Win32Transcoder methods use the fWinCP value and pass it to the
      Windows API functions like ::MultiByteToWideChar. This is wrong. The fIECP
      value should be used instead.

      For example when transcoding from the ISO-8859-2 encoding then fWinCP is 1250
      and fIECP is 28592. Win32Transcoder::transcodeFrom(...)
      calls ::MultiByteToWideChar(1250, ...). This transcodes from the Windows-1250
      code page, not from ISO-8859-2, and the result is wrong.

      The proposed patch:
      Replace fWinCP with fIECP in all calls of Windows API functions in all
      Win32Transcoder methods.

      In Win32Transcoder::transcodeFrom:
      ...............
      const unsigned int toEat = ::IsDBCSLeadByteEx(fIECP, *inPtr) ? 2 : 1;
      // Make sure a whol char is in the source
      if (inPtr + toEat > inEnd)
      break;
      // Try to translate this next char and check for an error
      const unsigned int converted = ::MultiByteToWideChar
      ( fIECP, MB_PRECOMPOSED | MB_ERR_INVALID_CHARS, (const char*)inPtr, toEat,
      outPtr, 1);
      ...............

      In Win32Transcoder::transcodeTo:
      ...............
      const unsigned int bytesStored = ::WideCharToMultiByte
      (fIECP, WC_COMPOSITECHECK | WC_SEPCHARS, srcPtr, 1, (char*)outPtr, outEnd -
      outPtr, 0, &usedDef);
      ...............

      In Win32Transcoder::canTranscodeTo:
      ...............
      const unsigned int bytesStored = ::WideCharToMultiByte
      (fIECP, WC_COMPOSITECHECK | WC_SEPCHARS, srcBuf, srcCount, tmpBuf, 64, 0,
      &usedDef);
      ...............

      Attachments

        Activity

          People

            Unassigned Unassigned
            jdrozd@software602.cz Janus Drozd
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: