Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-1132

Fail with exception if not enough lines in leipzig parser

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.8.2
    • 1.8.4
    • Language Detector
    • None

    Description

      Exception in thread "main" java.lang.IndexOutOfBoundsException: toIndex = 100000
      at java.util.ArrayList.subListRangeCheck(ArrayList.java:1004)
      at java.util.ArrayList.subList(ArrayList.java:996)
      at opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream$LeipzigSentencesStream.<init>(LeipzigLanguageSampleStream.java:65)
      at opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:157)
      at opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:42)
      at opennlp.tools.formats.leipzig.SampleShuffleStream.<init>(SampleShuffleStream.java:38)
      at opennlp.tools.formats.leipzig.LeipzigLanguageSampleStreamFactory.create(LeipzigLanguageSampleStreamFactory.java:76)
      at opennlp.tools.cmdline.AbstractConverterTool.run(AbstractConverterTool.java:106)
      at opennlp.tools.cmdline.CLI.main(CLI.java:256)

      line 65:
      Set<Integer> selectedLines = new HashSet<>(
      indexes.subList(0, sentencesPerSample * numberOfSamples));

      Fails if sentencesPerSample x numberOfSamples is larger than size of indexes (source file).

      Attachments

        Issue Links

          Activity

            People

              thygesen Peter Thygesen
              thygesen Peter Thygesen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: