Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
from mailing list(http://www.nabble.com/A-revised-proposal-for-REGEX-in-Groovy-to16216991.html):
Hi all,
Currently, we have to escape slash '/' in regex, for example /<b>abc<\/b>/, the code is not very concise.
and we can not write regex in multiple lines. the following code is written by Paul.
str = 'groovy.codehaus.org and www.aboutgroovy.com' re = '''(?x) # to enable whitespace and comments ( # capture the hostname in $1 (?: # these parens for grouping only (?! [-_] ) # lookahead for neither underscore nor dash [\\w-] + # hostname component \\. # and the domain dot ) + # now repeat that whole thing a bunch of times [A-Za-z] # next must be a letter [\\w-] + # now trailing domain part ) # end of $1 capture ''' finder = str =~ re out = str (0..<finder.count).each{ adr = finder[it][0] out = out.replaceAll(adr, "$adr[${InetAddress.getByName(adr).hostAddress}]") } println out // => groovy.codehaus.org [63.246.7.187] and www.aboutgroovy.com [63.246.7.76]
If we could use some syntax like:
|||<b>abc</b>|||, ||| (?x) # to enable whitespace and comments ( # capture the hostname in $1 (?: # these parens for grouping only (?! [-_] ) # lookahead for neither underscore nor dash [\w-] + # hostname component \. # and the domain dot ) + # now repeat that whole thing a bunch of times [A-Za-z] # next must be a letter [\w-] + # now trailing domain part ) # end of $1 capture |||
these problems could be resolved and the code was much more graceful and concise.
I raised a similiar proposal some month ago,
unfortunately, ternary slash has been used in commented(/////// some comment):
==============================================================
Hi all,
I offer a proposal for regex: ADD ternary slash to regex.
For example,
// now def s = /<\/script>/ // proposal def s = ///</script>///
It is inspired by single quotation mark and ternary quotation marks.
Best regards,
Daniel.Sun
-----------------------------------------------------------------------------
This one has been on my TODO list for a while. I'll add a Jira issue.
Not only does it allow you to enter slashes in a nice way as per
your example but it allows you to write multi-line regex's and store
scripts containing normal regex slashes as Strings.
So, the re variable in this example from PLEAC:
str = 'groovy.codehaus.org and www.aboutgroovy.com' re = '''(?x) # to enable whitespace and comments ( # capture the hostname in $1 (?: # these parens for grouping only (?! [-_] ) # lookahead for neither underscore nor dash [\\w-] + # hostname component \\. # and the domain dot ) + # now repeat that whole thing a bunch of times [A-Za-z] # next must be a letter [\\w-] + # now trailing domain part ) # end of $1 capture ''' finder = str =~ re out = str (0..<finder.count).each{ adr = finder[it][0] out = out.replaceAll(adr, "$adr[${InetAddress.getByName(adr).hostAddress}]") } println out // => groovy.codehaus.org [63.246.7.187] and www.aboutgroovy.com [63.246.7.76]
Could be written (note no doubling of the backslashes):
re = ///(?x) # to enable whitespace and comments ( # capture the hostname in $1 (?: # these parens for grouping only (?! [-_] ) # lookahead for neither underscore nor dash [\w-] + # hostname component \. # and the domain dot ) + # now repeat that whole thing a bunch of times [A-Za-z] # next must be a letter [\w-] + # now trailing domain part ) # end of $1 capture ///
And you can have strings like:
scriptToMakeWordsTitleCased = /// src = 'make all words title-cased' dst = src ('a'..'z').each{ dst = dst.replaceAll(/([^a-zA-Z])/+it+/|\A/+it,/$1/+it.toUpperCase()) } assert dst == 'Make All Words Title-Cased' ///
Otherwise with ''' or """ the \ in \A would need to be doubled and then itwouldn't
be evalable as a script.
Unfortunately, I still haven't found the time to work out the
right way to convince antlr to work with these. The trick is in
making sure antlr isn't confused with // comments. So /// when
it occurs where a String expression is not allowed just remains
a comment. I think we need this for B/C reasons, some people have
comments such as:
//////////////////////////////////////// // // My Comment // //////////////////////////////////////// Which although noisy should still be valid.
Paul.