Uploaded image for project: 'Axis'
  1. Axis
  2. AXIS-2908

Apache Axis fails to handle non Basic Multilingual Plane characters(U+10000 and above) while creating SOAP request

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4
    • Fix Version/s: None
    • Labels:
    • Environment:
      OS - CentOS
      Software Platform - JDK 7

      Description

      While creating SOAP request, if we have nonBMP characters(e.g. EMOJIs), they(EMOJIs) are not properly inserted in XML.

      It seems that my content which is UTF-8 will be encoded in UTF-16 Java String (default) once program receives it.

      Apache Axis library that we are using then take those UTF-16 Java Strings and try to convert back into UTF-8 to create a XML before sending. It fails whenever I send a 4-byte Emoji (:grin UTF-8 character. I found that any UTF-8 4-byte character will be represented as surrogate pair in UTF-16. I suspect in that case Axis parser not able to understand surrogate pair and not able to convert into valid UTF-8 encoding.

      As result, while UTF-8 is specified, these EMOJIs appear in UTF-16 form which actually corrupts them because they are then incorrectly processed.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              siddhesh_toraskar Siddhesh Sundar Toraskar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: