Bug 3273 - CharacterArrayCharacterIterator docs and implementation mismatch
Summary: CharacterArrayCharacterIterator docs and implementation mismatch
Status: CLOSED FIXED
Alias: None
Product: Regexp
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: All All
: P3 normal (vote)
Target Milestone: ---
Assignee: Jakarta Notifications Mailing List
URL: .../api/org/apache/regexp/CharacterAr...
Keywords:
Depends on:
Blocks:
 
Reported: 2001-08-25 23:38 UTC by Tony Robertson
Modified: 2004-11-16 19:05 UTC (History)
0 users



Attachments
Implementation of the fix suggested in this bug. (5.86 KB, patch)
2003-10-11 03:27 UTC, Oleg Sukhodolsky
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tony Robertson 2001-08-25 23:38:55 UTC
Using the RE.match(CharacterIterator,int) function
with a "CharacterArrayCharacterIterator", then calling "getParen(int)"
often returns a string of the incorrect length, or throws an exception.

This is due to the implementation of "substring(int,int)" in the
CharacterArrayCharacterIterator class and/or the mis-documentation of
the CharacterIterator.substring interface.

The confusion is in whether the second argument to substring represents
the endIndex or the length. The API docs say it's the length, but the
RE implementation, and the StringCharacterIterator implementation both
treat it as the endIndex.
[Note, the standard java string has,
java.lang.String.substring(int beginIndex, int endIndex)
but the constructor is java.lang.String(char[] src, int off, int len)]

Secondly, there is no check that the requested substring stays within the
bounds of the sequence length specified at construction time.
An IndexOutOfBoundsException should be thrown in that case.

I think the best solution is to first update the API docs to specify
that it is infact (beginIndex, endIndex), and then to update the 
CharacterArrayCharacterIterator.substring functions to be something like this:

 public String substring(int beginIndex, int endIndex)
 {
   if (endIndex > len)
     throw new IndexOutOfBoundsException("endIndex=" + endIndex +
	"; sequence size=" + len);
   if (beginIndex < 0)
     throw new IndexOutOfBoundsException("beginIndex=" + beginIndex);
   return new String(src, off + beginIndex, endIndex - beginIndex);
 }

 public String substring(int beginIndex)
 {
   if (beginIndex > len)
     throw new IndexOutOfBoundsException("index=" + beginIndex +
	"; sequence size=" + len);
   return new String(src, off + beginIndex, len - beginIndex);
 }
Comment 1 Oleg Sukhodolsky 2003-10-11 03:27:47 UTC
Created attachment 8532 [details]
Implementation of the fix suggested in this bug.
Comment 2 Vadim Gritsenko 2003-11-18 14:04:21 UTC
Changed bug summary from
  "CharacterArrayCharacterIterator substring function returns incorrect results"
to
  "CharacterArrayCharacterIterator docs and implementation mismatch"
Comment 3 Vadim Gritsenko 2003-11-18 14:10:38 UTC
Patch applied; please check and close the bug.

Thanks,
Vadim