Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-685

Improve POS tag handling of the KeywordLinkingEngine

    XMLWordPrintableJSON

Details

    Description

      The KeywordLinkingEngine can make use of POS tags to decide of a Token (word) needs to be processed or can be skipped. If no POS tags are available or the POS tag probability is to low (currently the default is 0.8) than the minimum token length (default is 3) is used as fall-back.

      Analyzing POS tag results have shown that often tags with non noun tags where below the 0.8 limit. For those the fall-back was used and in most cases this resulted in the KeywordLinkingEngine in processing those tokens.

      However it can also be observed that while some of those POS tags where not correct usually non correct tags where only between tags where both where non-noun tags. Because of that it can improve results and processing time to decrease the minimum probability for accepting an non noun POS tag.

      Because of that the algorithm will be adjusted like follows:

      Introduce two Tag Probabilities:

      1. "minPosTypeProb" for Accepting POS tags that represent Nouns and
      2. "minPosTypeProb/2" for rejecting POS tags that are not nouns

      Assuming that the <code>minPosTypePropb=0.667</code> a<ul>

      • noun with the prop 0.8 would result in returning <code>true</code>
      • noun with prop 0.5 would return <code>null</code>
      • verb with prop 0.4 would return <code>false</code>
      • verb with prop 0.3 would return <code>null</code>

      NOTES: <code>null</code> indicates that no POS tag is available or the POS tag has a low propability

      This changes will be need to be applied to the "OpenNlpAnalysedContentFactory#processPOS(..)" and the "EntityLinker#isProcessableToken(..)" methods

      Attachments

        Activity

          People

            rwesten Rupert Westenthaler
            rwesten Rupert Westenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: