Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.6.0
-
None
-
None
Description
Hello,
I do not understand why you are splitting the tokens with a whitespace in RegexNameFinder. It is pointless to me.
When we call `find(String[] token)` you rebuilt the string by appending a whitespace at the end of each token. Why?
I am saying that because maybe the original string has been tokenized by the SimpleTokenizer, and, as you know this tokenizer adds (for example) a whitespace within a word and a point. Example:
Original:
I am visiting Rome.
Tokenized:
I am visiting Rome*[SPLIT]*.
Regex is applied to:
I am visiting Rome .
(instead of the original)
In this version you have introduced a find() method that allows a String instead of String[], but in this case someone pass the original string not the rebuilt string, so the result are different.
Why do not apply a detokenize method to do the EXACT inverse operation of the tokenization? (and get the original string again instead of a modified string)
Thanks.