It would be nice if the ParserTool would make use of a real tokenizer. In addition to being the "right" thing to do, it would obviate issues like
OPENNLP-240 when using the parser tool.
While I realize that java.util.StringTokenizer effectively does the same work as WhitespaceTokenizer, it seems odd to use the former when the latter exists.
To this end, I'm attaching a patch that adds an additional method
public static Parse parseLine(String line, Parser parser, Tokenizer tokenizer, int numParses)
I've left the existing method
public static Parse parseLine(String line, Parser parser, int numParses)
in for convenience and backwards compatibility. It simply calls the new method with WhitespaceTokenizer.INSTANCE
For good measure, I've added a new command-line argument -tk, which takes the name of a tokenizer model. If none is specified, it will fall back on the current behavior of using the whitespace tokenizer.