Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
None
-
None
Description
I am testing openNLP and found some significant tokenization issue involving punctuation.
Thank you Costco!
i love costco!
I love Costco!!
FUCK IKEA.
In all these cases, the last punctuation is not split so "Costco!" and "IKEA." are treated as one token. This looks like a systematic problem.