Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.6-beta-1
-
None
-
None
Description
There are four relevant Matcher methods in DGM: array access with single int, array access with Collection, each and iterator. They're all inconsistent.
Array access acts as a 2D array, where the first index is the number of calls to Matcher.find(), and the second index is the group within the match. For example:
Text: "bannana"
Regex: /a(n+)/
m[0] represents the first time the regex matches in the string, e.g. the two characters right after the "b". m[0][0] is the whole string that matches ("ann") and m[0][1] is the first group ("nn"). Similarly, m[1][0] is "an" and m[1][1] is "n".
That's different than what's described in GINA or the JavaDoc, but I think it's better. The 2D aspect is what Ruby does, at least with String.scan().
If the regex doesn't have groups, then it acts like a 1D array of Strings.
Matcher.getAt(Collection indices) returns a string of the concatenation of all the relevant matches. It seems more consistent & useful to return a List.
Matcher.each() is like Matcher[]. It calls the closure once for each successful find(), passing the groups as arguments, or an array of all the groups if the closure has a single argument. It passes an array even if the regex doesn't have any groups; it seems more useful to just pass a string in that case.
Matcher.iterator() returns once for each successful find(), but it just returns the string of the whole match. There's no way to get any groups.
The semantics of the non-Collection array access seems most natural to me: once for each find(), and within that, an array of strings (if there's a group), or just a string (if there are no groups).
So, it seems like the others should be changed to that semantics, and each() and getAt() could even call iterator(Matcher) underneath, to ensure consistency.
Discussed on dev@groovy.codehaus.org starting Sept. 2, 2008.