Hi, I have found a regular expression and accoording String which hangs the JVM: RE: "^((\w|\s){1,50})$" String to match: "Controlling &Cheques" It will hang when I call RE re = new RE("^((\w|\s){1,50})$"); re.match("Controlling &Cheques"); Thanks for any help. Regards, Thomas
Sorry, but it should be RE re = new RE("^([\w\s]{1,50})$"); re.match("Controlling &Cheques"); But the problem remains the same..
It seems to me that the processing time grows exponentially with the amount of characters in the Search String. Especially the amount of chars before the special char (e.g. "&"). A String "&asfbd" will get through fast, but a String "asfbd&" will take much more time.
*** Bug 5243 has been marked as a duplicate of this bug. ***
*** Bug 15775 has been marked as a duplicate of this bug. ***
*** Bug 10940 has been marked as a duplicate of this bug. ***
*** Bug 23338 has been marked as a duplicate of this bug. ***
*** Bug 23303 has been marked as a duplicate of this bug. ***
Here is an re and input combo that hang forever: C.{1,2}C.{5,45}[VMFLWIE].C.{1,4}C.{1,4}[WYFVQHLT]H.{2}C.{5,45}[WFLYI].C.{2}C MEMKKKINMELKNRAPEEVTELVLDNCLCVNGEIEGLNDTFKELEFLSMANVELSSLARLPSLNKLRKLELSDNIISGGL EVLAEKCPNLTYLNLSGNKIKDLSTVEALQNLKNLKSLDLFNCEITNLEDYRESIFELLQQITYLDGFDQEDNEAPDSEE EDDDDEDGDEDEEDEDEDEAGPPEGYEEEEDDDEDEAGSEVGEGEEEVGLSYLMKDEIQDEEDDDDYVDEGEEEEEEEEE GLRGEKRKRDAEDDGEEDDD Thanks in advance for a patch!
Another re that hangs match: ^.{10,115}[DENF][ST][LIVMF][LIVSTEQ]V.[AGP][STANEQPK] this was created as re = newRE("the above pattern") and either re.match or re.getParenEnd(0) has been hanging for 15 hours already now. environment is j2sdk1.4.1_02 on Linux. To be fair, these bog down the C regex somewhat too, but nothing like this. smaller {n,m} ranges are matching ok.
I think that the cause of the problem is the way we compile <something>{n,m} constuction. As far as I can see, program for it is equal to <something>{n}(<something>|)...(<something>|) (this has exponentional complexity) So, during matching we have no way to optimize this. Possible way to fix the problem would be adding new operation for this construction, so we process it faster.
*** Bug 27670 has been marked as a duplicate of this bug. ***
*** Bug 28926 has been marked as a duplicate of this bug. ***
*** Bug 31719 has been marked as a duplicate of this bug. ***
Here is another combination that hangs it for quite some time: pattern: ^[a-zA-Z0-9 ]{1, 35}$ input : This is a sample string (remove it) I have observed a lot of recursive activity going on in debugger while matching this pattern. Interestingly enough the pattern ^[a-zA-Z0-9 ]*$ works really fast.
fixed in trunk
Using jakarta-regexp 1.5 and 1.6-dev (rev. 536456) I have the same problem with the following regexp: Pattern: <([^<]*[^<]+)+> Input: <a>remove it</a>
Sorry. The wrong input string. The input which goes into an endless loop is: <P align="center"><STRONG><FONT >test test</FONT></STRONG></P><P style=TEXT-ALIGN: left><FONT>
(In reply to comment #16) > Using jakarta-regexp 1.5 and 1.6-dev (rev. 536456) I have the same problem It is not the same problem since you do not have {,} in your regexp. > with the following regexp: > > Pattern: <([^<]*[^<]+)+> I don't understand what you meant here. Wouldn't it be exactly the same as: <([^<]+)+> And I bet this one will be faster too.
Using jakarta-regexp 1.5 the following regexp (yes, it is not what was intended, but anyway): ^((?:[!_.@?-]+)|(?:\w)){6,60}$ causes an infinite loop in RE.match using this input: 7ftt_.4q?iJoz7I8ky8c5BPwMüTge9D5-kÄtDAHöSLidNMNYbchäcsÖpLPPh Probably a new Bug?