Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-488

Doccat training tool throws NullPointer error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • Doccat
    • None
    • Using cygwin on Windows
      java version "1.6.0_27"
      Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
      Java HotSpot(TM) Client VM (build 20.2-b06, mixed mode)
      apache-opennlp-1.5.2

    Description

      When following the example in the OpenNLP 1.5.2 documentation I get a NullPointerException.

      http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.doccat.training.tool

      $ bin/opennlp DoccatTrainer -encoding UTF-8 -lang en -data en-doccat.train -model en-doccat.bin
      Indexing events using cutoff of 5

      Computing event counts... done. 2 events
      Indexing... Dropped event GMDecrease:[bow=Major, bow=acquisitions, bow=that, bow=have, bow=a, bow=lower, bow=gross, bow=margin, bow=than, bow=the, bow=existing, bow=network, bow=also, bow=had, bow=a, bow=negative, bow=impact, bow=on, bow=the, bow=overall, bow=gross, bow=margin,, bow=but, bow=it, bow=should, bow=improve, bow=following, bow=the, bow=implementation, bow=of, bow=its, bow=integration, bow=strategies, bow=.]
      Dropped event GMIncrease:[bow=The, bow=upward, bow=movement, bow=of, bow=gross, bow=margin, bow=resulted, bow=from, bow=amounts, bow=pursuant, bow=to, bow=adjustments, bow=to, bow=obligations, bow=towards, bow=dealers, bow=.]
      done.
      Sorting and merging events... Done indexing.
      Incorporating indexed data for training...
      Exception in thread "main" java.lang.NullPointerException
      at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
      at opennlp.maxent.GIS.trainModel(GIS.java:256)
      at opennlp.model.TrainUtil.train(TrainUtil.java:182)
      at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:154)
      at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:176)
      at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:192)
      at opennlp.tools.cmdline.doccat.DoccatTrainerTool.run(DoccatTrainerTool.java:91)
      at opennlp.tools.cmdline.CLI.main(CLI.java:191)

      The file "en-doccat.train" is UTF-8 encoded in UNIX format and looks like this:

      GMDecrease Major acquisitions that have a lower gross margin than the existing network also had a negative impact on the overall gross margin, but it should improve following the implementation of its integration strategies .
      GMIncrease The upward movement of gross margin resulted from amounts pursuant to adjustments to obligations towards dealers .

      Attachments

        1. en-doccat.train
          0.3 kB
          Erik Andersson
        2. OPENNLP-488.patch
          2 kB
          Jeff Zemerick

        Issue Links

          Activity

            People

              teofili Tommaso Teofili
              ejjick Erik Andersson
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 0.5h
                  0.5h
                  Remaining:
                  Remaining Estimate - 0.5h
                  0.5h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified