Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-316

Evaluator and CrossValidator programs of the main analyzers throw exceptions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: tools-1.5.2-incubating
    • Fix Version/s: tools-1.5.2-incubating
    • Labels:
      None
    • Environment:

      Description

      Evaluator and CrossValidator programs of the main analyzers throw an exception when running

      (test performed on the 1.5.3 dist via command line)

      It seems that the SentenceDetector, Tokenizer, PosTagger and the
      chunker (at least) throw a java.lang.NullPointerException if the
      misclassified parameter is set to false or not present for the
      Evaluator programs.
      The Evaluator programs works (provide a result) when the
      misclassified parameter is set.
      The CrossValidator programs do not work at all.

      I have not test the other opennlp programs.

      See below some example of the runs.
      I tested on the examples from the documentation and also with my data.
      For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
      Tell if you want more details or anything

      $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
      data/model/fr-sent.bin -data data/test/fr-sent.test
      Loading Sentence Detector model ... done (0,013s)
      Evaluating ... in thread "main" java.lang.NullPointerException
      at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
      at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
      at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
      at opennlp.tools.cmdline.CLI.main(CLI.java:191)

      $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
      data/train/fr-sent.train -misclassified true
      Indexing events using cutoff of 5

      Computing event counts... done. 0 events
      Indexing... done.
      Sorting and merging events... Done indexing.
      Incorporating indexed data for training...
      Exception in thread "main" java.lang.NullPointerException
      at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
      at opennlp.maxent.GIS.trainModel(GIS.java:256)
      at opennlp.model.TrainUtil.train(TrainUtil.java:182)
      at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
      at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
      at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
      at opennlp.tools.cmdline.CLI.main(CLI.java:191)

      $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
      data/model/fr-token.bin -data data/test/fr-token.test
      Loading Tokenizer model ... done (0,428s)
      Evaluating ... Exception in thread "main" java.lang.NullPointerException
      at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
      at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
      at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
      at opennlp.tools.cmdline.CLI.main(CLI.java:191)

      $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
      data/train/fr-token.train
      Indexing events using cutoff of 5
      Computing event counts... done. 100333 events
      Indexing... done.
      Sorting and merging events... done. Reduced 100333 events to 30168.
      Done indexing.
      Incorporating indexed data for training...
      done.
      Number of Event Tokens: 30168
      Number of Outcomes: 2
      Number of Predicates: 8287
      ...done.
      Computing model parameters ...
      Performing 100 iterations.
      1: ... loglikelihood=-69545.53606709359 0.9337805108987073
      2: ... loglikelihood=-18987.123809719425 0.9497872085953774
      ...
      98: ... loglikelihood=-607.4216932752298 0.9989534848952987
      99: ... loglikelihood=-603.2346954947699 0.9989734185163406
      100: ... loglikelihood=-599.1235213848983 0.9989833853268616
      Exception in thread "main" java.lang.NullPointerException
      at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
      at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
      at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
      at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

        Attachments

          Activity

            People

            • Assignee:
              colen William Colen
              Reporter:
              nicolas.hernandez Nicolas Hernandez
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: