Using the RE.match(CharacterIterator,int) function with a "CharacterArrayCharacterIterator", then calling "getParen(int)" often returns a string of the incorrect length, or throws an exception. This is due to the implementation of "substring(int,int)" in the CharacterArrayCharacterIterator class and/or the mis-documentation of the CharacterIterator.substring interface. The confusion is in whether the second argument to substring represents the endIndex or the length. The API docs say it's the length, but the RE implementation, and the StringCharacterIterator implementation both treat it as the endIndex. [Note, the standard java string has, java.lang.String.substring(int beginIndex, int endIndex) but the constructor is java.lang.String(char[] src, int off, int len)] Secondly, there is no check that the requested substring stays within the bounds of the sequence length specified at construction time. An IndexOutOfBoundsException should be thrown in that case. I think the best solution is to first update the API docs to specify that it is infact (beginIndex, endIndex), and then to update the CharacterArrayCharacterIterator.substring functions to be something like this: public String substring(int beginIndex, int endIndex) { if (endIndex > len) throw new IndexOutOfBoundsException("endIndex=" + endIndex + "; sequence size=" + len); if (beginIndex < 0) throw new IndexOutOfBoundsException("beginIndex=" + beginIndex); return new String(src, off + beginIndex, endIndex - beginIndex); } public String substring(int beginIndex) { if (beginIndex > len) throw new IndexOutOfBoundsException("index=" + beginIndex + "; sequence size=" + len); return new String(src, off + beginIndex, len - beginIndex); }
Created attachment 8532 [details] Implementation of the fix suggested in this bug.
Changed bug summary from "CharacterArrayCharacterIterator substring function returns incorrect results" to "CharacterArrayCharacterIterator docs and implementation mismatch"
Patch applied; please check and close the bug. Thanks, Vadim