Yes, ORO is superior in terms of raw speed, on average ORO is ~17% faster. This has been measured with a CrawlDB rougly about 2.2 million URLS. The generator is not limited with -topN.
Java regex averages on 310 seconds whereas ORO averages on 263 seconds run time. This was on a dedicated machine without Hadoop.
More interesting, in my opinion, is the reduced memory consumption. ORO uses almost three times more heap space than util.regex. The same generate cycles show about 12.4% for ORO and util.regex never went higher than 4.8%.
Is the performance penalty considered to be blocking?