Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-371

Confusing error message in tokenizer training

    XMLWordPrintableJSON

Details

    Description

      The following error message

      java.lang.IllegalArgumentException: The maxent model is not compatible with the tokenizer!
      at opennlp.tools.util.model.BaseModel.checkArtifactMap(BaseModel.java:275)
      at opennlp.tools.tokenize.TokenizerModel.<init>(TokenizerModel.java:73)
      at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:267)
      at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:231)
      at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:293)
      at opennlp.tools.tokenize.TokenizerTestUtil.createMaxentTokenModel(TokenizerTestUtil.java:67)
      at opennlp.tools.tokenize.TokenizerMETest.testTokenizer(TokenizerMETest.java:54)
      ... cut

      might be confusing.

      Due to error in my conversion tool, I tried to train a tokenizer model on data without <SPLIT>s, which resulted in a model with one outcome only. This model did not pass validation in ModelUtil.validateOutcomes(), which is correct, however, the error message is a bit confusing and it took some time to understood what is going on.

      I would agree, that a model with different outcomes than expected is incompatible with the tool, but with less outcomes? Is the model with less outcomes than expected really incompatible? For example, with POS tagger I have corpora and models which use a subset of PTB tagset.

      However, in case of tokenizer this incompatibility makes sense (model with 1 outcome does not work) and in this case the message might be improved to indicate the cause better. Something like: "The maxent model is not compatible with the tokenizer: outcome XXX is not found".

      Please, advice. Thank you!

      Attachments

        Issue Links

          Activity

            People

              joern Jörn Kottmann
              autayeu Aliaksandr Autayeu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: