Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-1166

TwoPassDataIndexer fails if features contain \n

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.8.3
    • 1.8.4
    • Machine Learning
    • None

    Description

      Training a model with Newline tokens causes TwoPassDataIndexer to throw exception

      Exception in thread "main" java.util.NoSuchElementException
      at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
      at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:71)
      at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:35)
      at opennlp.tools.ml.model.AbstractDataIndexer.index(AbstractDataIndexer.java:168)
      at opennlp.tools.ml.model.TwoPassDataIndexer.index(TwoPassDataIndexer.java:72)
      at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:68)
      at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:90)
      at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:244)
      at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:169)
      at opennlp.tools.cmdline.CLI.main(CLI.java:256)

      Attachments

        Issue Links

          Activity

            People

              thygesen Peter Thygesen
              thygesen Peter Thygesen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: