Looks like that the expression ".
(/.+?)/.?\1/.*?\1/" in regex-urlfilter.txt wasn't compatible with java.util.regex that is actually used in the regex url filter.
May be it was missed to change it when the regular expression packages was changed.
The problem was that until reducing a fetch map output the reducer hangs forever since the outputformat was applying the urlfilter a url that causes the hang.
060315 230823 task_r_3n4zga at java.lang.Character.codePointAt(Character.java:2335)
060315 230823 task_r_3n4zga at java.util.regex.Pattern$Dot.match(Pattern.java:4092)
060315 230823 task_r_3n4zga at java.util.regex.Pattern$Curly.match1(Pattern.java:
I changed the regular expression to ".*(/[^/]+)/[^/]+\1/[^/]+\1/" and now the fetch job works. (thanks to Grant and Chris B. helping to find the new regex)
However may people can review it and can suggest improvements, since the old regex would match :
"abcd/foo/bar/foo/bar/foo/" and so will the new one match it also. But the old regex would also match :
"abcd/foo/bar/xyz/foo/bar/foo/" which the new regex will not match.