Details
-
Improvement
-
Status: Reopened
-
Minor
-
Resolution: Unresolved
-
1.9.0
-
None
-
None
-
None
Description
When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to check if each characters in the sentence is one of eos characters. I think we'd better use HashSet to keep eosCharacters instead of char[].
In accordance with this replacement, I'd like to make getEndOfSentenceCharacters() deprecated because it returns char[] and nobody in OpenNLP calls it at present, and I'd like to add the equivalent method which returns Set<Character> of eos chars. Though it cannot keep the order of eos chars but I don't think it can be a problem anyway.