Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-14521

Unicode problem in Bindy component for fixed length data

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1.0
    • Component/s: camel-bindy
    • Environment:

      JDK: openjdk-8-jdk Version 8u242-b08-0ubuntu3~18.04 on Ubuntu 18.04 amd64

      The ICU4J library was used for processing Unicode correctly: See dependencies in POM

    • Patch Info:
      Patch Available
    • Estimated Complexity:
      Moderate
    • Flags:
      Patch

      Description

       

      Hi, 

      AFAIK all versions of came are affected by the following bug: Camel counts the chars in the fixed length data format wrongly. 

      Unicode is a bit tricky, when it comes to counting the length of a string specially since Java uses internally UTF-16, which means depending on the codepoint 1 - 2 (Java-)chars. Bindy seems to use internally for selection substring and counts chars like Java does. This means the length of a string is the count of the chars, i.e. UTF-16 surrogates, but not codepoints, which is the common denominator (e.g. see definition of string length in XMLSchema). And when one takes combing chars into account (one "base char" plus 0 - n combining chars are perceived as one "char" by users) it becomes even more of a problem.

      Fixed length data format is totally dependent on counting chars correctly, which makes it unsuable if the chars are not correctly counted, since it cannot recover for "colums" to the right.

      See also the mailing list at http://mail-archives.apache.org/mod_mbox/camel-users/202001.mbox/browser

      As suggested I created a pull request, since this may be of some interest for the community. The ICU4J lib was used, for processing Unicode correctly, since the functionality built into the Java API is too old to process modern emojis (skin colour, hair, sex) correctly. Please watch the license...

      Pull-request: https://github.com/apache/camel/pull/3552

       

        Attachments

          Activity

            People

            • Assignee:
              davsclaus Claus Ibsen
              Reporter:
              greulich Michael Greulich
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: