Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-1214

use hash to avoid linear search in DefaultEndOfSentenceScanner

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Reopened
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.9.0
    • Fix Version/s: 1.9.1
    • Component/s: None
    • Labels:
      None

      Description

      When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to check if each characters in the sentence is one of eos characters. I think we'd better use HashSet to keep eosCharacters instead of char[].

      In accordance with this replacement, I'd like to make getEndOfSentenceCharacters() deprecated because it returns char[] and nobody in OpenNLP calls it at present, and I'd like to add the equivalent method which returns Set<Character> of eos chars. Though it cannot keep the order of eos chars but I don't think it can be a problem anyway.

        Attachments

          Activity

            People

            • Assignee:
              koji Koji Sekiguchi
              Reporter:
              koji Koji Sekiguchi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: