[OPENNLP-371] Confusing error message in tokenizer training - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: tools-1.5.3
Fix Version/s: 1.7.2
Component/s: Tokenizer
Labels:
- model
- tokenizer
- training

Description

The following error message

java.lang.IllegalArgumentException: The maxent model is not compatible with the tokenizer!
at opennlp.tools.util.model.BaseModel.checkArtifactMap(BaseModel.java:275)
at opennlp.tools.tokenize.TokenizerModel.<init>(TokenizerModel.java:73)
at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:267)
at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:231)
at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:293)
at opennlp.tools.tokenize.TokenizerTestUtil.createMaxentTokenModel(TokenizerTestUtil.java:67)
at opennlp.tools.tokenize.TokenizerMETest.testTokenizer(TokenizerMETest.java:54)
... cut

might be confusing.

Due to error in my conversion tool, I tried to train a tokenizer model on data without <SPLIT>s, which resulted in a model with one outcome only. This model did not pass validation in ModelUtil.validateOutcomes(), which is correct, however, the error message is a bit confusing and it took some time to understood what is going on.

I would agree, that a model with different outcomes than expected is incompatible with the tool, but with less outcomes? Is the model with less outcomes than expected really incompatible? For example, with POS tagger I have corpora and models which use a subset of PTB tagset.

However, in case of tokenizer this incompatibility makes sense (model with 1 outcome does not work) and in this case the message might be improved to indicate the cause better. Something like: "The maxent model is not compatible with the tokenizer: outcome XXX is not found".

Please, advice. Thank you!

Attachments

Issue Links

links to

GitHub Pull Request #106

Activity

People

Assignee:: Jörn Kottmann

Reporter:: Aliaksandr Autayeu

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Nov/11 15:33

Updated:: 31/Jan/17 11:58

Resolved:: 31/Jan/17 11:58