Bug 3773 - Problem with parsing greedy match modifiers
Summary: Problem with parsing greedy match modifiers
Status: CLOSED FIXED
Alias: None
Product: Regexp
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Jakarta Notifications Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2001-09-21 16:00 UTC by Matthew Kiss
Modified: 2004-11-16 19:05 UTC (History)
0 users



Attachments
Suggested fix for the bug (1.52 KB, patch)
2003-10-06 05:40 UTC, Oleg Sukhodolsky
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Kiss 2001-09-21 16:00:37 UTC
I am trying to match the string (nothing after the final "6d"):

UUID=3babc217.0007d4e1.74726163.006e616d

with the following expressions:

1) 'UUID=\w{8}\.\w{8}\.\w{8}\.\w{8}'  -> Match succeeds
2) 'UUID=(\w{8}\.){3}\w{8}'  -> Match fails
3) 'UUID=(\w{8}\.){2}\w{8}'  -> Match succeeds

I think there is a parse bug, these expressions 1 & 2 seem identical. If I have
misinterpreted the expression syntax, sorry for the report. The test trace
output for expressions 1 & 2 is below:

$ ./RETest -i 'UUID=\w{8}\.\w{8}\.\w{8}\.\w{8}'

UUID=\w{8}\.\w{8}\.\w{8}\.\w{8}

0. OP_BRANCH, opdata = 0, next = 119
3. OP_ATOM, opdata = 5, next = 11, "UUID="
11. OP_ESCAPE, opdata = 119, next = 14
14. OP_ESCAPE, opdata = 119, next = 17
17. OP_ESCAPE, opdata = 119, next = 20
20. OP_ESCAPE, opdata = 119, next = 23
23. OP_ESCAPE, opdata = 119, next = 26
26. OP_ESCAPE, opdata = 119, next = 29
29. OP_ESCAPE, opdata = 119, next = 32
32. OP_ESCAPE, opdata = 119, next = 35
35. OP_ATOM, opdata = 1, next = 39, "."
39. OP_ESCAPE, opdata = 119, next = 42
42. OP_ESCAPE, opdata = 119, next = 45
45. OP_ESCAPE, opdata = 119, next = 48
48. OP_ESCAPE, opdata = 119, next = 51
51. OP_ESCAPE, opdata = 119, next = 54
54. OP_ESCAPE, opdata = 119, next = 57
57. OP_ESCAPE, opdata = 119, next = 60
60. OP_ESCAPE, opdata = 119, next = 63
63. OP_ATOM, opdata = 1, next = 67, "."
67. OP_ESCAPE, opdata = 119, next = 70
70. OP_ESCAPE, opdata = 119, next = 73
73. OP_ESCAPE, opdata = 119, next = 76
76. OP_ESCAPE, opdata = 119, next = 79
79. OP_ESCAPE, opdata = 119, next = 82
82. OP_ESCAPE, opdata = 119, next = 85
85. OP_ESCAPE, opdata = 119, next = 88
88. OP_ESCAPE, opdata = 119, next = 91
91. OP_ATOM, opdata = 1, next = 95, "."
95. OP_ESCAPE, opdata = 119, next = 98
98. OP_ESCAPE, opdata = 119, next = 101
101. OP_ESCAPE, opdata = 119, next = 104
104. OP_ESCAPE, opdata = 119, next = 107
107. OP_ESCAPE, opdata = 119, next = 110
110. OP_ESCAPE, opdata = 119, next = 113
113. OP_ESCAPE, opdata = 119, next = 116
116. OP_ESCAPE, opdata = 119, next = 119
119. OP_END, opdata = 0, next = none
> UUID=3babb63b.000402dc.74726163.006e616d
Match successful.
$0 = UUID=3babb63b.000402dc.74726163.006e616d

$ ./RETest -i 'UUID=(\w{8}\.){3}\w{8}'

UUID=(\w{8}\.){3}\w{8}

0. OP_BRANCH, opdata = 0, next = 122
3. OP_ATOM, opdata = 5, next = 11, "UUID="
11. OP_OPEN, opdata = 1, next = 14
14. OP_BRANCH, opdata = 0, next = 45
17. OP_ESCAPE, opdata = 119, next = 20
20. OP_ESCAPE, opdata = 119, next = 23
23. OP_ESCAPE, opdata = 119, next = 26
26. OP_ESCAPE, opdata = 119, next = 29
29. OP_ESCAPE, opdata = 119, next = 32
32. OP_ESCAPE, opdata = 119, next = 35
35. OP_ESCAPE, opdata = 119, next = 38
38. OP_ESCAPE, opdata = 119, next = 41
41. OP_ATOM, opdata = 1, next = 45, "."
45. OP_CLOSE, opdata = 1, next = 48
48. OP_OPEN, opdata = 2, next = 51
51. OP_BRANCH, opdata = 0, next = 79
54. OP_ESCAPE, opdata = 119, next = 57
57. OP_BRANCH, opdata = 0, next = 69
60. OP_ESCAPE, opdata = 119, next = 63
63. OP_BRANCH, opdata = 0, next = 66
66. OP_GOTO, opdata = 0, next = 57
69. OP_BRANCH, opdata = 0, next = 72
72. OP_NOTHING, opdata = 0, next = 75
75. OP_ATOM, opdata = 1, next = 79, "."
79. OP_CLOSE, opdata = 2, next = 82
82. OP_OPEN, opdata = 3, next = 85
85. OP_BRANCH, opdata = 0, next = 95
88. OP_ESCAPE, opdata = 119, next = 91
91. OP_ATOM, opdata = 1, next = 95, "."
95. OP_CLOSE, opdata = 3, next = 98
98. OP_ESCAPE, opdata = 119, next = 101
101. OP_ESCAPE, opdata = 119, next = 104
104. OP_ESCAPE, opdata = 119, next = 107
107. OP_ESCAPE, opdata = 119, next = 110
110. OP_ESCAPE, opdata = 119, next = 113
113. OP_ESCAPE, opdata = 119, next = 116
116. OP_ESCAPE, opdata = 119, next = 119
119. OP_ESCAPE, opdata = 119, next = 122
122. OP_END, opdata = 0, next = none
> UUID=3babb63b.000402dc.74726163.006e616d
Match failed.
Comment 1 Oleg Sukhodolsky 2003-10-06 05:38:35 UTC
The minimal regexp to reproduce the problem is (a{2}b){2}.
Here is an output of RETest for this regexp:
Z:\src\regexp\jakarta-regexp\build>v.jar org.apache.regexp.RETest -i (a{2}b){2}

(a{2}b){2}

0. OP_BRANCH, opdata = 0, next = 40
3. OP_OPEN, opdata = 1, next = 6
6. OP_BRANCH, opdata = 0, next = 21
9. OP_ATOM, opdata = 1, next = 13, "a"
13. OP_ATOM, opdata = 1, next = 17, "a"
17. OP_ATOM, opdata = 1, next = 21, "b"
21. OP_CLOSE, opdata = 1, next = 24
24. OP_OPEN, opdata = 2, next = 27
27. OP_BRANCH, opdata = 0, next = 37
30. OP_NOTHING, opdata = 0, next = 33
33. OP_ATOM, opdata = 1, next = 37, "b"
37. OP_CLOSE, opdata = 2, next = 40
40. OP_END, opdata = 0, next = none

The cause of the problem is in algorithm which RECompiler uses to handle
<regexp>{n,m} construction.
It reduce n stored in bracketsMin array and restart parsing from begin of the 
regexp. But it doesn't clear barcketsXXX for nested constructions.
Thus when it next time finds one of nested brackets it thinks that it was {0, m}
and replaces appropriate atom by OP_NOTHING.
Comment 2 Oleg Sukhodolsky 2003-10-06 05:40:09 UTC
Created attachment 8460 [details]
Suggested fix for the bug
Comment 3 Vadim Gritsenko 2003-12-20 17:52:50 UTC
Patch applied, thanks.