Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-14521

Unicode problem in Bindy component for fixed length data



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0
    • camel-bindy
    • JDK: openjdk-8-jdk Version 8u242-b08-0ubuntu3~18.04 on Ubuntu 18.04 amd64

      The ICU4J library was used for processing Unicode correctly: See dependencies in POM

    • Patch Available
    • Moderate
    • Patch




      AFAIK all versions of came are affected by the following bug: Camel counts the chars in the fixed length data format wrongly. 

      Unicode is a bit tricky, when it comes to counting the length of a string specially since Java uses internally UTF-16, which means depending on the codepoint 1 - 2 (Java-)chars. Bindy seems to use internally for selection substring and counts chars like Java does. This means the length of a string is the count of the chars, i.e. UTF-16 surrogates, but not codepoints, which is the common denominator (e.g. see definition of string length in XMLSchema). And when one takes combing chars into account (one "base char" plus 0 - n combining chars are perceived as one "char" by users) it becomes even more of a problem.

      Fixed length data format is totally dependent on counting chars correctly, which makes it unsuable if the chars are not correctly counted, since it cannot recover for "colums" to the right.

      See also the mailing list at http://mail-archives.apache.org/mod_mbox/camel-users/202001.mbox/browser

      As suggested I created a pull request, since this may be of some interest for the community. The ICU4J lib was used, for processing Unicode correctly, since the functionality built into the Java API is too old to process modern emojis (skin colour, hair, sex) correctly. Please watch the license...

      Pull-request: https://github.com/apache/camel/pull/3552





            davsclaus Claus Ibsen
            greulich Michael Greulich
            0 Vote for this issue
            2 Start watching this issue