Bug 38331 - ArrayIndexOutOfBoundsException under certain conditions
Summary: ArrayIndexOutOfBoundsException under certain conditions
Status: CLOSED FIXED
Alias: None
Product: Regexp
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: ---
Assignee: Jakarta Notifications Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-20 15:35 UTC by Josh Rodman
Modified: 2007-03-07 16:25 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Josh Rodman 2006-01-20 15:35:04 UTC
This code generates an exception when running with jdk1.3.1_17:

RE r123 = new RE("((a|b){1637})");
r123.match("a");

This code works properly:

RE r123 = new RE("((a|b){1638})");
r123.match("a");

This code shows that depending on the number requested, regexp switches between 
working and not working:

boolean lastvalue = true;
for(int i = 1; i < 3650; i+=1) {
    try {
        RE r = new RE("((a|b){" + i + "})");
        r.match("a");
	if (!lastvalue) { System.out.println("Switching from NOT to WORKING 
at " + i + " (" + i + " works) "+lastvalue); }
	lastvalue = true;
    } catch (Exception ex) {
	if (lastvalue) { System.out.println("Switching from WORKING to NOT at " 
+ i + " (" + i + " doesn't work) "+lastvalue); }
	lastvalue = false;
    }
}

This behavior, if "i" was allowed past 3650, would switch back and forth a 
couple more times before 10000, however seen it happen above 7000 (this is as 
far as I let it test). In RE.java, look under the following signature:

protected int matchNodes(int firstNode, int lastNode, int idxStart)

Look for this line:

next   = node + (short)instruction[node + offsetNext];

Change it to say:

next   = node + (int)instruction[node + offsetNext];

Recompile and test and this problem appears to go away, however I cannot 
confirm that it doesn't break something else. I'm not sure why "short" would 
have been chosen over "int". Maybe there is a hidden reason.
Comment 1 Vadim Gritsenko 2007-03-07 16:25:20 UTC
instruction is an array of chars, which means it has two bytes values. Offset
from one instruction to another takes one char in the array, so it must be
within [Short.MIN_VALUE, MAX_VALUE]. Some of the programs (like a{8192}) in
current version are compiled into code exceeding this size (more than
Short.MAX_VALUE instructions), and so can not be expressed correctly.

Added check for this condition to RECompiler.