Bug 14954 - A bug caused by '-' in char class def ('[...]')
Summary: A bug caused by '-' in char class def ('[...]')
Status: CLOSED DUPLICATE of bug 19329
Alias: None
Product: Regexp
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal (vote)
Target Milestone: ---
Assignee: Jakarta Notifications Mailing List
URL:
Keywords:
: 15381 15455 16214 16434 (view as bug list)
Depends on:
Blocks:
 
Reported: 2002-11-29 11:05 UTC by Ikuya Morikawa
Modified: 2004-11-16 19:05 UTC (History)
3 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ikuya Morikawa 2002-11-29 11:05:58 UTC
When I put a '-' in a character class definition ('[...]'), there are some
cases that a simple char in the definition is ignored. In such cases,
instructions in REProgram objects are not as expected. This may be related to
the bugs #2121 and #5212. 

For example, '[a-zA]' works fine, while for '[Aa-z]', 'A' is ignored, and
for '[abcd\-]', 'd' is ignored. The point is that the ignored char is at
2-chars before '-'.

Near Line 710 in RECompiler.java, we can see:
>                  // If simple character and not start of range, include it
>                 if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-')
>                  {
>                     range.include(simpleChar, include);
>                  }
In my understanding, idx is pointing the next char of the simpleChar in
question. The simpleChar should not be included when its next char (if any)
is '-' (in that case, the simpleChar turns to be a start of a new range.)
Therefore, the following code seems correct:
>                 if (idx >= len || pattern.charAt(idx) != '-')

I tried this fix on the CVS'ed source tree last night, with some new testcases,
and it worked fine. I'm not sure there is no side effect of this; at least all
tests in RETest.txt are still successful.

The diff output follows. Does this help?

Ikuya


Index: docs/RETest.txt
===================================================================
RCS file: /home/cvspublic/jakarta-regexp/docs/RETest.txt,v
retrieving revision 1.3
diff -c -r1.3 RETest.txt
*** docs/RETest.txt     27 Feb 2001 08:37:05 -0000      1.3
--- docs/RETest.txt     28 Nov 2002 14:22:25 -0000
***************
*** 1011,1014 ****
--- 1011,1030 ----
  YES
  aaabc

+ #168
+ [a-zA]+
+ JakartaAnt
+ YES
+ akartaAnt

+ #169
+ [Aa-z]+
+ JakartaAnt
+ YES
+ akartaAnt
+
+ #170
+ [akrt\-]+
+ Jakarta-Ant
+ YES
+ akarta-
Index: src/java/org/apache/regexp/RECompiler.java
===================================================================
RCS file: /home/cvspublic/jakarta-
regexp/src/java/org/apache/regexp/RECompiler.java,v
retrieving revision 1.4
diff -c -r1.4 RECompiler.java
*** src/java/org/apache/regexp/RECompiler.java  27 Feb 2001 08:37:05 -0000     
 1.4
--- src/java/org/apache/regexp/RECompiler.java  28 Nov 2002 14:22:26 -0000
***************
*** 710,716 ****
              else
              {
                  // If simple character and not start of range, include it
!                 if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-')
                  {
                      range.include(simpleChar, include);
                  }
--- 710,716 ----
              else
              {
                  // If simple character and not start of range, include it
!                 if (idx >= len || pattern.charAt(idx) != '-')
                  {
                      range.include(simpleChar, include);
                  }
Comment 1 Vadim Gritsenko 2003-04-25 12:32:15 UTC
*** Bug 15381 has been marked as a duplicate of this bug. ***
Comment 2 Vadim Gritsenko 2003-04-25 12:34:25 UTC
*** Bug 15455 has been marked as a duplicate of this bug. ***
Comment 3 Vadim Gritsenko 2003-04-25 12:36:32 UTC
*** Bug 16434 has been marked as a duplicate of this bug. ***
Comment 4 Vadim Gritsenko 2003-04-25 13:26:46 UTC
*** Bug 16214 has been marked as a duplicate of this bug. ***
Comment 5 Vadim Gritsenko 2003-04-25 18:26:33 UTC
Fixed by patch in bug #19329

*** This bug has been marked as a duplicate of 19329 ***
Comment 6 Vadim Gritsenko 2003-05-02 01:10:34 UTC
Fixed by Bug #19329