[OPENNLP-1330] Parser top k parses doesn't show "top" (highest probability) parses. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.9.3
Fix Version/s: 1.9.4
Component/s: Parser
Labels:
- Parser

Flags:

Patch

Description

The default "top n parses" of 1 shows the most likely parse; one with a log prob of -2.9.

Passing in anything greater than 1 returns the least probable parses.

➜ bin git:(master) echo "Eric is testing." | ./opennlp Parser -k 1 ~/src/prhyme/resources/models/en-parser-chunking.bin
Loading Parser model ... done (2.515s)
0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))

Average: 41.7 sent/s
Total: 1 sent
Runtime: 0.024s
Execution time: 2.585 seconds
➜ bin git:(master) echo "Eric is testing." | ./opennlp Parser -k 2 ~/src/prhyme/resources/models/en-parser-chunking.bin
Loading Parser model ... done (2.578s)
0 -14.968552218782305 (TOP (S (NP (NNP Eric)) (SBAR (S (ADVP (VBZ is)) (VBG testing.)))))
1 -13.957766393572408 (TOP (S (SBAR (S (NP (NNP Eric)) (ADVP (VBZ is)))) (VBG testing.)))

Average: 95.2 sent/s
Total: 2 sent
Runtime: 0.021s
Execution time: 2.640 seconds

The fix is clear and simple. We should be taking from the first of the TreeSet rather than from the end.


 else if (numParses == 1) {
 return new Parse[] {completeParses.first()};
 }
 else {
 List<Parse> topParses = new ArrayList<>(numParses);
 while (!completeParses.isEmpty() && topParses.size() < numParses) {
 Parse tp = completeParses.last();      //// <--- Change to .first()
 completeParses.remove(tp);
 topParses.add(tp);
 //parses.remove(tp);
 }
 return topParses.toArray(new Parse[topParses.size()]);
 }

After patch, results are what I expect.

➜ bin git:(master) ✗ echo "Eric is testing." | ./opennlp Parser -k 1 ~/src/prhyme/resources/models/en-parser-chunking.bin
Loading Parser model ... done (2.517s)
0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))

Average: 45.5 sent/s
Total: 1 sent
Runtime: 0.022s
Execution time: 2.580 seconds
➜ bin git:(master) ✗ echo "Eric is testing." | ./opennlp Parser -k 2 ~/src/prhyme/resources/models/en-parser-chunking.bin
Loading Parser model ... done (2.530s)
0 -2.913239374955309 (TOP (S (NP (NNP Eric)) (VP (VBZ is)) (. testing.)))
1 -3.1132674983145825 (TOP (S (NP (NNP Eric)) (VP (VBZ is) (NP (NN testing.)))))

Average: 90.9 sent/s
Total: 2 sent
Runtime: 0.022s
Execution time: 2.596 seconds

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

FixParserTopK.patch
10/Jun/21 05:09
1 kB
Eric Ihli

Activity

People

Assignee:: Unassigned

Reporter:: Eric Ihli

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Jun/21 05:10

Updated:: 23/Jun/21 12:37

Resolved:: 23/Jun/21 12:36