Description
The default "top n parses" of 1 shows the most likely parse; one with a log prob of -2.9.
Passing in anything greater than 1 returns the least probable parses.
➜ bin git:(master) echo "Eric is testing." | ./opennlp Parser -k 1 ~/src/prhyme/resources/models/en-parser-chunking.bin Loading Parser model ... done (2.515s) 0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.))) Average: 41.7 sent/s Total: 1 sent Runtime: 0.024s Execution time: 2.585 seconds ➜ bin git:(master) echo "Eric is testing." | ./opennlp Parser -k 2 ~/src/prhyme/resources/models/en-parser-chunking.bin Loading Parser model ... done (2.578s) 0 -14.968552218782305 (TOP (S (NP (NNP Eric)) (SBAR (S (ADVP (VBZ is)) (VBG testing.))))) 1 -13.957766393572408 (TOP (S (SBAR (S (NP (NNP Eric)) (ADVP (VBZ is)))) (VBG testing.))) Average: 95.2 sent/s Total: 2 sent Runtime: 0.021s Execution time: 2.640 seconds
The fix is clear and simple. We should be taking from the first of the TreeSet rather than from the end.
else if (numParses == 1) { return new Parse[] {completeParses.first()}; } else { List<Parse> topParses = new ArrayList<>(numParses); while (!completeParses.isEmpty() && topParses.size() < numParses) { Parse tp = completeParses.last(); //// <--- Change to .first() completeParses.remove(tp); topParses.add(tp); //parses.remove(tp); } return topParses.toArray(new Parse[topParses.size()]); }
After patch, results are what I expect.
➜ bin git:(master) ✗ echo "Eric is testing." | ./opennlp Parser -k 1 ~/src/prhyme/resources/models/en-parser-chunking.bin Loading Parser model ... done (2.517s) 0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.))) Average: 45.5 sent/s Total: 1 sent Runtime: 0.022s Execution time: 2.580 seconds ➜ bin git:(master) ✗ echo "Eric is testing." | ./opennlp Parser -k 2 ~/src/prhyme/resources/models/en-parser-chunking.bin Loading Parser model ... done (2.530s) 0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.))) 1 -3.1132674983145825 (TOP (S (NP (NNP Eric)) (VP (VBZ is) (NP (NN testing.))))) Average: 90.9 sent/s Total: 2 sent Runtime: 0.022s Execution time: 2.596 seconds