Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-1330

Parser top k parses doesn't show "top" (highest probability) parses.

VotersStop watchingWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.9.3
    • 1.9.4
    • Parser
    • Patch

    Description

       The default "top n parses" of 1 shows the most likely parse; one with a log prob of -2.9.

      Passing in anything greater than 1 returns the least probable parses.

      ➜ bin git:(master) echo "Eric is testing." | ./opennlp Parser -k 1 ~/src/prhyme/resources/models/en-parser-chunking.bin
      Loading Parser model ... done (2.515s)
      0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))
      
      Average: 41.7 sent/s
      Total: 1 sent
      Runtime: 0.024s
      Execution time: 2.585 seconds
      ➜ bin git:(master) echo "Eric is testing." | ./opennlp Parser -k 2 ~/src/prhyme/resources/models/en-parser-chunking.bin
      Loading Parser model ... done (2.578s)
      0 -14.968552218782305 (TOP (S (NP (NNP Eric)) (SBAR (S (ADVP (VBZ is)) (VBG testing.)))))
      1 -13.957766393572408 (TOP (S (SBAR (S (NP (NNP Eric)) (ADVP (VBZ is)))) (VBG testing.)))
      
      Average: 95.2 sent/s
      Total: 2 sent
      Runtime: 0.021s
      Execution time: 2.640 seconds
      
      

       The fix is clear and simple. We should be taking from the first of the TreeSet rather than from the end.

      
       else if (numParses == 1) {
       return new Parse[] {completeParses.first()};
       }
       else {
       List<Parse> topParses = new ArrayList<>(numParses);
       while (!completeParses.isEmpty() && topParses.size() < numParses) {
       Parse tp = completeParses.last();      //// <--- Change to .first()
       completeParses.remove(tp);
       topParses.add(tp);
       //parses.remove(tp);
       }
       return topParses.toArray(new Parse[topParses.size()]);
       }

      After patch, results are what I expect.

       

      ➜ bin git:(master) ✗ echo "Eric is testing." | ./opennlp Parser -k 1 ~/src/prhyme/resources/models/en-parser-chunking.bin
      Loading Parser model ... done (2.517s)
      0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))
      
      Average: 45.5 sent/s
      Total: 1 sent
      Runtime: 0.022s
      Execution time: 2.580 seconds
      ➜ bin git:(master) ✗ echo "Eric is testing." | ./opennlp Parser -k 2 ~/src/prhyme/resources/models/en-parser-chunking.bin
      Loading Parser model ... done (2.530s)
      0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))
      1 -3.1132674983145825 (TOP (S (NP (NNP Eric)) (VP (VBZ is) (NP (NN testing.)))))
      
      Average: 90.9 sent/s
      Total: 2 sent
      Runtime: 0.022s
      Execution time: 2.596 seconds
      

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            eihli Eric Ihli
            Votes:
            0 Vote for this issue
            Watchers:
            3 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment