Details
Description
The following error message
java.lang.IllegalArgumentException: The maxent model is not compatible with the tokenizer!
at opennlp.tools.util.model.BaseModel.checkArtifactMap(BaseModel.java:275)
at opennlp.tools.tokenize.TokenizerModel.<init>(TokenizerModel.java:73)
at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:267)
at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:231)
at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:293)
at opennlp.tools.tokenize.TokenizerTestUtil.createMaxentTokenModel(TokenizerTestUtil.java:67)
at opennlp.tools.tokenize.TokenizerMETest.testTokenizer(TokenizerMETest.java:54)
... cut
might be confusing.
Due to error in my conversion tool, I tried to train a tokenizer model on data without <SPLIT>s, which resulted in a model with one outcome only. This model did not pass validation in ModelUtil.validateOutcomes(), which is correct, however, the error message is a bit confusing and it took some time to understood what is going on.
I would agree, that a model with different outcomes than expected is incompatible with the tool, but with less outcomes? Is the model with less outcomes than expected really incompatible? For example, with POS tagger I have corpora and models which use a subset of PTB tagset.
However, in case of tokenizer this incompatibility makes sense (model with 1 outcome does not work) and in this case the message might be improved to indicate the cause better. Something like: "The maxent model is not compatible with the tokenizer: outcome XXX is not found".
Please, advice. Thank you!
Attachments
Issue Links
- links to