I am trying to match the string (nothing after the final "6d"): UUID=3babc217.0007d4e1.74726163.006e616d with the following expressions: 1) 'UUID=\w{8}\.\w{8}\.\w{8}\.\w{8}' -> Match succeeds 2) 'UUID=(\w{8}\.){3}\w{8}' -> Match fails 3) 'UUID=(\w{8}\.){2}\w{8}' -> Match succeeds I think there is a parse bug, these expressions 1 & 2 seem identical. If I have misinterpreted the expression syntax, sorry for the report. The test trace output for expressions 1 & 2 is below: $ ./RETest -i 'UUID=\w{8}\.\w{8}\.\w{8}\.\w{8}' UUID=\w{8}\.\w{8}\.\w{8}\.\w{8} 0. OP_BRANCH, opdata = 0, next = 119 3. OP_ATOM, opdata = 5, next = 11, "UUID=" 11. OP_ESCAPE, opdata = 119, next = 14 14. OP_ESCAPE, opdata = 119, next = 17 17. OP_ESCAPE, opdata = 119, next = 20 20. OP_ESCAPE, opdata = 119, next = 23 23. OP_ESCAPE, opdata = 119, next = 26 26. OP_ESCAPE, opdata = 119, next = 29 29. OP_ESCAPE, opdata = 119, next = 32 32. OP_ESCAPE, opdata = 119, next = 35 35. OP_ATOM, opdata = 1, next = 39, "." 39. OP_ESCAPE, opdata = 119, next = 42 42. OP_ESCAPE, opdata = 119, next = 45 45. OP_ESCAPE, opdata = 119, next = 48 48. OP_ESCAPE, opdata = 119, next = 51 51. OP_ESCAPE, opdata = 119, next = 54 54. OP_ESCAPE, opdata = 119, next = 57 57. OP_ESCAPE, opdata = 119, next = 60 60. OP_ESCAPE, opdata = 119, next = 63 63. OP_ATOM, opdata = 1, next = 67, "." 67. OP_ESCAPE, opdata = 119, next = 70 70. OP_ESCAPE, opdata = 119, next = 73 73. OP_ESCAPE, opdata = 119, next = 76 76. OP_ESCAPE, opdata = 119, next = 79 79. OP_ESCAPE, opdata = 119, next = 82 82. OP_ESCAPE, opdata = 119, next = 85 85. OP_ESCAPE, opdata = 119, next = 88 88. OP_ESCAPE, opdata = 119, next = 91 91. OP_ATOM, opdata = 1, next = 95, "." 95. OP_ESCAPE, opdata = 119, next = 98 98. OP_ESCAPE, opdata = 119, next = 101 101. OP_ESCAPE, opdata = 119, next = 104 104. OP_ESCAPE, opdata = 119, next = 107 107. OP_ESCAPE, opdata = 119, next = 110 110. OP_ESCAPE, opdata = 119, next = 113 113. OP_ESCAPE, opdata = 119, next = 116 116. OP_ESCAPE, opdata = 119, next = 119 119. OP_END, opdata = 0, next = none > UUID=3babb63b.000402dc.74726163.006e616d Match successful. $0 = UUID=3babb63b.000402dc.74726163.006e616d $ ./RETest -i 'UUID=(\w{8}\.){3}\w{8}' UUID=(\w{8}\.){3}\w{8} 0. OP_BRANCH, opdata = 0, next = 122 3. OP_ATOM, opdata = 5, next = 11, "UUID=" 11. OP_OPEN, opdata = 1, next = 14 14. OP_BRANCH, opdata = 0, next = 45 17. OP_ESCAPE, opdata = 119, next = 20 20. OP_ESCAPE, opdata = 119, next = 23 23. OP_ESCAPE, opdata = 119, next = 26 26. OP_ESCAPE, opdata = 119, next = 29 29. OP_ESCAPE, opdata = 119, next = 32 32. OP_ESCAPE, opdata = 119, next = 35 35. OP_ESCAPE, opdata = 119, next = 38 38. OP_ESCAPE, opdata = 119, next = 41 41. OP_ATOM, opdata = 1, next = 45, "." 45. OP_CLOSE, opdata = 1, next = 48 48. OP_OPEN, opdata = 2, next = 51 51. OP_BRANCH, opdata = 0, next = 79 54. OP_ESCAPE, opdata = 119, next = 57 57. OP_BRANCH, opdata = 0, next = 69 60. OP_ESCAPE, opdata = 119, next = 63 63. OP_BRANCH, opdata = 0, next = 66 66. OP_GOTO, opdata = 0, next = 57 69. OP_BRANCH, opdata = 0, next = 72 72. OP_NOTHING, opdata = 0, next = 75 75. OP_ATOM, opdata = 1, next = 79, "." 79. OP_CLOSE, opdata = 2, next = 82 82. OP_OPEN, opdata = 3, next = 85 85. OP_BRANCH, opdata = 0, next = 95 88. OP_ESCAPE, opdata = 119, next = 91 91. OP_ATOM, opdata = 1, next = 95, "." 95. OP_CLOSE, opdata = 3, next = 98 98. OP_ESCAPE, opdata = 119, next = 101 101. OP_ESCAPE, opdata = 119, next = 104 104. OP_ESCAPE, opdata = 119, next = 107 107. OP_ESCAPE, opdata = 119, next = 110 110. OP_ESCAPE, opdata = 119, next = 113 113. OP_ESCAPE, opdata = 119, next = 116 116. OP_ESCAPE, opdata = 119, next = 119 119. OP_ESCAPE, opdata = 119, next = 122 122. OP_END, opdata = 0, next = none > UUID=3babb63b.000402dc.74726163.006e616d Match failed.
The minimal regexp to reproduce the problem is (a{2}b){2}. Here is an output of RETest for this regexp: Z:\src\regexp\jakarta-regexp\build>v.jar org.apache.regexp.RETest -i (a{2}b){2} (a{2}b){2} 0. OP_BRANCH, opdata = 0, next = 40 3. OP_OPEN, opdata = 1, next = 6 6. OP_BRANCH, opdata = 0, next = 21 9. OP_ATOM, opdata = 1, next = 13, "a" 13. OP_ATOM, opdata = 1, next = 17, "a" 17. OP_ATOM, opdata = 1, next = 21, "b" 21. OP_CLOSE, opdata = 1, next = 24 24. OP_OPEN, opdata = 2, next = 27 27. OP_BRANCH, opdata = 0, next = 37 30. OP_NOTHING, opdata = 0, next = 33 33. OP_ATOM, opdata = 1, next = 37, "b" 37. OP_CLOSE, opdata = 2, next = 40 40. OP_END, opdata = 0, next = none The cause of the problem is in algorithm which RECompiler uses to handle <regexp>{n,m} construction. It reduce n stored in bracketsMin array and restart parsing from begin of the regexp. But it doesn't clear barcketsXXX for nested constructions. Thus when it next time finds one of nested brackets it thinks that it was {0, m} and replaces appropriate atom by OP_NOTHING.
Created attachment 8460 [details] Suggested fix for the bug
Patch applied, thanks.