[LUCENE-8698] Fix replaceIgnoreCase method bug in EscapeQuerySyntaxImpl - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: core/queryparser
Labels:
None

Lucene Fields:

New

Description

It is a patch of LUCENE-8572 issue from tonicava.

There is a serious bug in the replaceIgnoreCase method of the EscapeQuerySyntaxImpl class.

This issue can affect QueryNode. (StringIndexOutOfBoundsException)

As I mentioned in comment of the issue, the String#toLowerCase() causes the array to grow in size.

private static CharSequence replaceIgnoreCase(CharSequence string,
    CharSequence sequence1, CharSequence escapeChar, Locale locale) {
  // string = "İpone " [304, 112, 111, 110, 101, 32],  size = 6
  ...
  while (start < count) {
    // Convert by toLowerCase as follows.
    // string = "i'̇pone " [105, 775, 112, 111, 110, 101, 32], size = 7
    // firstIndex will be set 6.
    if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first,
        start)) == -1)
      break;
    boolean found = true;
    ...
    if (found) {
      // In this line, String.toString() will only have a range of 0 to 5.
      // So here we get a StringIndexOutOfBoundsException.
      result.append(string.toString().substring(copyStart, firstIndex));
      ...
    } else {
      start = firstIndex + 1;
    }
  }
  ...
}

Maintaining the overall structure and fixing bug is very simple.

If we change to the following code, the method works fine.

// Line 135 ~ 136
// BEFORE
if ((firstIndex = string.toString().toLowerCase(locale).indexOf(first, start)) == -1)

// AFTER
if ((firstIndex = string.toString().indexOf(first, start)) == -1)

But I wonder if this is the best way.

How do you think about using String#replace() instead?

// SAMPLE : escapeWhiteChar (escapeChar and escapeQuoted are same)
// BEFORE
private static final CharSequence escapeWhiteChar(CharSequence str,
    Locale locale) {
  ...
  for (int i = 0; i < escapableWhiteChars.length; i++) {
    buffer = replaceIgnoreCase(buffer, escapableWhiteChars[i].toLowerCase(locale),
        "\\", locale);
  }
  ...
}

// AFTER
private static final CharSequence escapeWhiteChar(CharSequence str,
    Locale locale) {
  ...
  for (int i = 0; i < escapableWhiteChars.length; i++) {
    buffer = buffer.toString().replace(escapableWhiteChars[i], "\\" + escapableWhiteChars[i]);
  }
  ...
}

First, I upload the patch using String#replace().
If you give me some feedback, I will check it

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-8698.patch
18/Feb/19 17:00
8 kB
Namgyu Kim

Issue Links

links to

GitHub Pull Request #585

Activity

People

Assignee:: Unassigned

Reporter:: Namgyu Kim

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Feb/19 16:56

Updated:: 28/Aug/22 15:42

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h