Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-579

Framework to dynamically link N-best matches from external data to named entities by type (EntityLinker framework)

    Details

    • Type: Wish
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 1.6.0
    • Component/s: Entity Linker
    • Labels:
    • Environment:
      Any

      Description

      A framework for integrating/linking external data to named entities. For instance, geocoding or georeferencing location entities to geonames gazateers can be implemented as an EntityLinker. Initially created ticket to specifically solve the georeferencing/geolocating/geotagging problem, but the framework should allow linkage of any external data to any entity type. Commercial applications that do this are expensive, and there are many free gazateers one could use to create solutions with OpenNLP.
      UPDATE: The current implementation of the GeoEntityLinker uses Lucene to store the Gazateers, and provides utils for indexing them. The impl returns lat, long (and other gaz fields) for toponyms extracted with NER.
      All extracted toponyms are scored in four ways: fuzzy string matching, binning by location, context modeling, and country-mention proximity. These scores enable a good means of deciding what's worth keeping from the gaz.

        Attachments

        1. entitylinker.properties
          0.5 kB
          Mark Giaconia
        2. opennlp.geoentitylinker.countrycontext.txt
          16 kB
          Mark Giaconia

          Activity

            People

            • Assignee:
              joern Joern Kottmann
              Reporter:
              giaconia_mark Mark Giaconia
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1,082h
                1,082h
                Remaining:
                Remaining Estimate - 1,082h
                1,082h
                Logged:
                Time Spent - Not Specified
                Not Specified