Uploaded image for project: 'Commons Lang'
  1. Commons Lang
  2. LANG-708

StringEscapeUtils.escapeEcmaScript from lang3 cuts off long unicode string

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.1
    • lang.text.translate.*
    • None

    Description

      Hello, I have a really big JSON string (generated from db) with unicode chars and I need to pass it though StringEscapeUtils.escapeEcmaScript(value). This string is generated and in most cases it works ok, but I have met a specific string (attached below) which is not correctly converted - few symbols (about 10) at the end of the string are cut-off (and actually they are not already unicode chars).

      the original string ends with:
      "geonameId":6544329,"valueCode":""}]

      and the produced string ends with:
      \"geonameId\":6544329,\"value

      So Code":""}] part is missing and this does not allow to parse the result as JSON on the client side.

      I have tried to debug a bit with StringEscapeUtils.escapeEcmaScript source code and is seems that the problem is somewhere around here:

      CharSequenceTranslator.translate(...){
      ...
      int sz = Character.codePointCount(input, 0, input.length());
      for (int i = 0; i < sz; i++) {
      // consumed is the number of codepoints consumed
      int consumed = translate(input, i, out);

      if(consumed == 0)

      { out.write( Character.toChars( Character.codePointAt(input, i) ) ); }

      ...
      }

      If I put breakpoint condition to stop in the loop when i==(sz-5), I can see that the last chars of "valueCode" literal are being added to the end of "out" stream, but the counter condition ends too early to reach the end of original input String.

      So, it seems that somehow with the provided string either the sz value is calculated incorrectly or the processing loop did wrong counter adjustmes at some point.

      Attachments

        1. Test.java
          2 kB
          anton
        2. input.txt
          6 kB
          anton

        Activity

          People

            Unassigned Unassigned
            benderamp anton
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: