Details
-
Task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
The MAIL_REGEX in UrlCharSequenceNormalizer is unbounded and requires backtracking. In rare cases, this can cause eye-opening performance costs.
I tested the other regexes in the other normalizers. I could be wrong, but they don't appear to require backtracking, and there are no surprising performance costs.
Attachments
Issue Links
- is fixed by
-
OPENNLP-1350 MAIL_REGEX in UrlCharSequenceNormalizer causes quadratic complexity for certain input, and is also a bit imprecise
- Resolved
- is related to
-
OPENNLP-1265 Improve speed of lang detect
- Closed
- links to