Description
Steps to reproduce:
1. Call the method like that:
int count = StringUtils.countMatches("abaabaababaab", "aba");
Actual result: the value of count variable equals 3
Expected result: the value of count variable equals 4
The substrings are highlighted in red:
abaabaababaab
abaabaababaab
abaabaababaab
abaabaababaab
Method returns incorrect value because of this code:
while ((idx = CharSequenceUtils.indexOf(str, sub, idx)) != INDEX_NOT_FOUND) {
count++;
idx += sub.length();
}
This looks like a greedy algorithm - but increasing the idx variable by the length of substring could lead to the problems like in example:
Let's say that idx = 6, so we try to find a substring in the highlighted suffix:
abaabaababaab
We found the substring, so idx now becomes idx + 3 = 9. So now this suffix will be used for searching substring in it:
abaabaababaab
But because of increasing the value of idx by 3 we won't find the substring (abaabaababaab) which intersects with the already found substring on the last step.
Basically, this method will work incorrectly with any substrings that intersect with each other.
There is also a unit test with incorrect expected value:
assertEquals(4, StringUtils.countMatches("oooooooooooo", "ooo"));
If this behavior (counting substrings that do not intersect) is intended, please update the JavaDoc to reflect it. Right now it looks like that:
Counts how many times the substring appears in the larger string.
Link for the PR: https://github.com/apache/commons-lang/pull/615