Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8698

Fix replaceIgnoreCase method bug in EscapeQuerySyntaxImpl

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/queryparser
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      It is a patch of LUCENE-8572 issue from Octavian Mocanu.

       

      There is a serious bug in the replaceIgnoreCase method of the EscapeQuerySyntaxImpl class.

      This issue can affect QueryNode. (StringIndexOutOfBoundsException)

      As I mentioned in comment of the issue, the String#toLowerCase() causes the array to grow in size.

      private static CharSequence replaceIgnoreCase(CharSequence string,
          CharSequence sequence1, CharSequence escapeChar, Locale locale) {
        // string = "İpone " [304, 112, 111, 110, 101, 32],  size = 6
        ...
        while (start < count) {
          // Convert by toLowerCase as follows.
          // string = "i'̇pone " [105, 775, 112, 111, 110, 101, 32], size = 7
          // firstIndex will be set 6.
          if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first,
              start)) == -1)
            break;
          boolean found = true;
          ...
          if (found) {
            // In this line, String.toString() will only have a range of 0 to 5.
            // So here we get a StringIndexOutOfBoundsException.
            result.append(string.toString().substring(copyStart, firstIndex));
            ...
          } else {
            start = firstIndex + 1;
          }
        }
        ...
      }

      Maintaining the overall structure and fixing bug is very simple.

      If we change to the following code, the method works fine.

       

      // Line 135 ~ 136
      // BEFORE
      if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first, start)) == -1)
      
      // AFTER
      if ((firstIndex = string.toString().indexOf(first, start)) == -1)
      

       

       

      But I wonder if this is the best way.

      How do you think about using String#replace() instead?

       

      // SAMPLE : escapeWhiteChar (escapeChar and escapeQuoted are same)
      // BEFORE
      private static final CharSequence escapeWhiteChar(CharSequence str,
          Locale locale) {
        ...
        for (int i = 0; i < escapableWhiteChars.length; i++) {
          buffer = replaceIgnoreCase(buffer, escapableWhiteChars[i].toLowerCase(locale),
              "\\", locale);
        }
        ...
      }
      
      // AFTER
      private static final CharSequence escapeWhiteChar(CharSequence str,
          Locale locale) {
        ...
        for (int i = 0; i < escapableWhiteChars.length; i++) {
          buffer = buffer.toString().replace(escapableWhiteChars[i], "\\" + escapableWhiteChars[i]);
        }
        ...
      }
      

       

      First, I upload the patch using String#replace().
      If you give me some feedback, I will check it

       

       

        Attachments

        1. LUCENE-8698.patch
          8 kB
          Namgyu Kim

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                danmuzi Namgyu Kim
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h