[GROOVY-3028] Make Regular Expression handling consistent - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.6-beta-1
Fix Version/s: 1.6-beta-2
Component/s: None
Labels:
None

Description

There are four relevant Matcher methods in DGM: array access with single int, array access with Collection, each and iterator. They're all inconsistent.

Array access acts as a 2D array, where the first index is the number of calls to Matcher.find(), and the second index is the group within the match. For example:

Text: "bannana"
Regex: /a(n+)/

m[0] represents the first time the regex matches in the string, e.g. the two characters right after the "b". m[0][0] is the whole string that matches ("ann") and m[0][1] is the first group ("nn"). Similarly, m[1][0] is "an" and m[1][1] is "n".

That's different than what's described in GINA or the JavaDoc, but I think it's better. The 2D aspect is what Ruby does, at least with String.scan().

If the regex doesn't have groups, then it acts like a 1D array of Strings.

Matcher.getAt(Collection indices) returns a string of the concatenation of all the relevant matches. It seems more consistent & useful to return a List.

Matcher.each() is like Matcher[]. It calls the closure once for each successful find(), passing the groups as arguments, or an array of all the groups if the closure has a single argument. It passes an array even if the regex doesn't have any groups; it seems more useful to just pass a string in that case.

Matcher.iterator() returns once for each successful find(), but it just returns the string of the whole match. There's no way to get any groups.

The semantics of the non-Collection array access seems most natural to me: once for each find(), and within that, an array of strings (if there's a group), or just a string (if there are no groups).

So, it seems like the others should be changed to that semantics, and each() and getAt() could even call iterator(Matcher) underneath, to ensure consistency.

Discussed on dev@groovy.codehaus.org starting Sept. 2, 2008.

Attachments

Activity

People

Assignee:: Martin C. Martin

Reporter:: Martin C. Martin

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 04/Sep/08 19:37

Updated:: 12/Oct/08 08:49

Resolved:: 04/Sep/08 20:03