Details
Description
The unit tests of urlfilter-regex and urlfilter-automaton include a benchmark. After playing and benchmarking modifications the following changes seem to significantly improve the performance:
- do not extract host and domain name from the URL if not needed (no host/domain-specific rules used, cf.
NUTCH-1838) - use non-capturing groups if possible
- use (?i) to make the patterns case insensitive and remove uppercase variants
Attachments
Issue Links
- links to