Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-629

Third person singular verbs are wrongly tagged as NNS instead of VBG

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: tools-1.5.3
    • Fix Version/s: None
    • Component/s: Parser
    • Labels:
      None
    • Environment:
      Windows, java 8

      Description

      Hi team

      In many cases, verbs (third person, singular) are wrongly tagged as "NNS" instead of being tagged as VBG.

      For example, for "the dog barks", we get the following parsing results:

      -0.4873670715270621 = DT/0.9543650543068872 NN/0.9934635416261295 NNS/0.6478473815054814
      -2.3176263333647076 = DT/0.9543650543068872 NN/0.9934635416261295 ./0.10389656993335769
      -2.5438814602384756 = DT/0.9543650543068872 NN/0.9934635416261295 POS/0.08285903227408052
      -3.1472424852917578 = DT/0.9543650543068872 NN/0.9934635416261295 VBG/0.045321418371414506
      -3.3093737662787484 = DT/0.9543650543068872 NN/0.9934635416261295 RB/0.03853814197383135
      -3.785492750117388 = DT/0.9543650543068872 NN/0.9934635416261295 IN/0.023939491699927738
      -4.419574088556415 = DT/0.9543650543068872 NN/0.9934635416261295 NN/0.0126980460743554
      -4.641227787202645 = DT/0.9543650543068872 NN/0.9934635416261295 WDT/0.010173582713485872
      -4.645517470925252 = DT/0.9543650543068872 NN/0.9934635416261295 :/0.010130034731632277
      -5.319832699567059 = DT/0.9543650543068872 NN/0.9934635416261295 ''/0.005161305328825757
      (TOP (NP (DT the) (NN dog) (NNS barks)))
      2.6064504449697834
      (TOP (S (NP (DT the) (NN dog)) (NNS barks)))
      1.9485980564427359

      The biggest probability for the third term is found for NNS - by far - 0.64.
      In comparison, VBG is found with a probability of only 0.04.

      This parsing error manifests itself consistently, for most occurrences of the third person / singular verbs, regardless the context.

      Am I missing something?
      Maybe there is some supplementary configuration that controls this?

      Can this be fixed only through code, or we need to patch our training data set?

      Thank you so much.

      BR,
      Ioan

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ibarbulescu Ioan Barbulescu

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment