Description
While training the DocumentCategorizerME it is possible to set the type of Tokenizer that the categorizer should use.
i,e doccatFactory.setTokenizer(SemicolonTokenizer.INSTANCE);
But the Tokenizer class is hardcoded to WhitespaceTokenizer in the DocumentSampleStream class.
So it is not possible to modify the default tokenizing behaviour even after setting it in the doccatFactory.