Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-1266

Limit normalization regexes in UrlCharSequenceNormalizer

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.9.4
    • None
    • None

    Description

      The MAIL_REGEX in UrlCharSequenceNormalizer is unbounded and requires backtracking. In rare cases, this can cause eye-opening performance costs.

       

      I tested the other regexes in the other normalizers.  I could be wrong, but they don't appear to require backtracking, and there are no surprising performance costs.

      Attachments

        Issue Links

          Activity

            People

              tallison Tim Allison
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: