OpenNLP
  1. OpenNLP
  2. OPENNLP-488

Doccat training tool throws NullPointer error

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Doccat
    • Labels:
      None
    • Environment:
      Using cygwin on Windows
      java version "1.6.0_27"
      Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
      Java HotSpot(TM) Client VM (build 20.2-b06, mixed mode)
      apache-opennlp-1.5.2

      Description

      When following the example in the OpenNLP 1.5.2 documentation I get a NullPointerException.

      http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.doccat.training.tool

      $ bin/opennlp DoccatTrainer -encoding UTF-8 -lang en -data en-doccat.train -model en-doccat.bin
      Indexing events using cutoff of 5

      Computing event counts... done. 2 events
      Indexing... Dropped event GMDecrease:[bow=Major, bow=acquisitions, bow=that, bow=have, bow=a, bow=lower, bow=gross, bow=margin, bow=than, bow=the, bow=existing, bow=network, bow=also, bow=had, bow=a, bow=negative, bow=impact, bow=on, bow=the, bow=overall, bow=gross, bow=margin,, bow=but, bow=it, bow=should, bow=improve, bow=following, bow=the, bow=implementation, bow=of, bow=its, bow=integration, bow=strategies, bow=.]
      Dropped event GMIncrease:[bow=The, bow=upward, bow=movement, bow=of, bow=gross, bow=margin, bow=resulted, bow=from, bow=amounts, bow=pursuant, bow=to, bow=adjustments, bow=to, bow=obligations, bow=towards, bow=dealers, bow=.]
      done.
      Sorting and merging events... Done indexing.
      Incorporating indexed data for training...
      Exception in thread "main" java.lang.NullPointerException
      at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
      at opennlp.maxent.GIS.trainModel(GIS.java:256)
      at opennlp.model.TrainUtil.train(TrainUtil.java:182)
      at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:154)
      at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:176)
      at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:192)
      at opennlp.tools.cmdline.doccat.DoccatTrainerTool.run(DoccatTrainerTool.java:91)
      at opennlp.tools.cmdline.CLI.main(CLI.java:191)

      The file "en-doccat.train" is UTF-8 encoded in UNIX format and looks like this:

      GMDecrease Major acquisitions that have a lower gross margin than the existing network also had a negative impact on the overall gross margin, but it should improve following the implementation of its integration strategies .
      GMIncrease The upward movement of gross margin resulted from amounts pursuant to adjustments to obligations towards dealers .

      1. en-doccat.train
        0.3 kB
        Erik Andersson

        Issue Links

          Activity

          Joern Kottmann made changes -
          Link This issue duplicates OPENNLP-122 [ OPENNLP-122 ]
          Erik Andersson made changes -
          Field Original Value New Value
          Attachment en-doccat.train [ 12520801 ]
          Erik Andersson created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Erik Andersson
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 0.5h
                0.5h
                Remaining:
                Remaining Estimate - 0.5h
                0.5h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development