Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-1252

Add support for MIN_FOUND_TOKENS to the Lucene FST Linking Engine

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.12.0
    • 0.12.0
    • None
    • None

    Description

      The FST linking engine already allows to configure in percentage how much of a processable chunk (typically noun phrases) need to match so that a suggestion is accepted. This is done by using the "enhancer.engines.linking.minChunkMatchScore" property. The default is > 50%.

      While this way of configuration is great for chunks created by NamedEntityAnnotations it is not always well suited for detected noun phrases as those may select larger sections of a sentence. E.g. "goalie Mathias Lange (Iserlohn Roosters)" will not match any Entity in a vocabulary as it contains 5 matchable tokens but both the player "Mathias Lange" and the Team name "Iserlohn Roosters" do only represent two of them.

      In such cases the configuration of a fixed lower limit of the number of (matchable) Tokens that need to match within a Chunk can be preferable.

      For this configuration the FST linking engine will use the "Min Matched Tokens (enhancer.engines.linking.minFoundTokens)" property of the EntityLinker configuration. The default will be "2".

      The FST linking Engine will accept tokens the either confirm with "enhancer.engines.linking.minChunkMatchScore" or "enhancer.engines.linking.minFoundTokens".

      NOTE: those configuration do only apply for Tokens within a processable Chunk (typically a Noun Phrase)

      Attachments

        Activity

          People

            rwesten Rupert Westenthaler
            rwesten Rupert Westenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment