Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-1128

Implement a Lucene FST based Entity Linking Engine (based on OpenSextant / SolrTextTagger)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Enhancement Engines
    • None

    Description

      This will implement an in-memory EntityLinking EnhancementEngine based on Lucenes FST (Finite State Transducer) technology.

      This engine could make direct use of the Classes contained in the OpenSextant / SolrTextTagger code [1]. This uses a two layered FST

      (1) to represent words and
      (2) to map those words to phrases

      With this it is possible to efficiently hold big vocabularies in memory (> 300MByte for geonames.org). See the presentation at [2] for more details.

      While the license is fully compatible (ASL 2.0) the library is currently not available on Maven Central. We need to contact the author regarding this.

      [1] https://github.com/OpenSextant/SolrTextTagger
      [2] http://www.lucenerevolution.org/2013/Text-Tagging-with-Finite-State-Transducers

      Attachments

        Issue Links

          Activity

            People

              rwesten Rupert Westenthaler
              rwesten Rupert Westenthaler
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: