Details
-
Task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.9.4, 2.0.0
-
None
Description
In the tokenizer documentation in the user guide, the usage of the tool shows a cutoff option:
-cutoff num
minimal number of times a feature must be seen, ignored if -params is used.
However, this option is not present in the usage when running the CLI:
Arguments description:
-factory factoryName
A sub-class of TokenizerFactory where to get implementation and resources.
-abbDict path
abbreviation dictionary in XML format.
-alphaNumOpt isAlphaNumOpt
Optimization flag to skip alpha numeric tokens for further tokenization
-params paramsFile
training parameters file.
-lang language
language which is being processed.
-model modelFile
output model file.
-data sampleData
data to be used, usually a file name.
-encoding charsetName
encoding for reading and writing text, if absent the system default is used.
The CLI does not recognize cutoff as an option so it is likely the documentation is incorrect but a review of the code should probably be done first to be sure.