Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
tools-1.5.2-incubating
-
None
Description
The DoccatTool should read the doccat default format from stdin. The default format is whitespace tokenized, but the DoccatTool uses the Simple Tokenizer to tokenize the input text.
To fix this issue use the Whitespace Tokenizer instead of the Simple Tokenizer.