[STANBOL-1156] Freebase Entity Disambiguation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Story
Status: Closed
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Component/s: Data, Enhancement Engines, Enhancer, Entityhub
Labels:

Description

Since ~~STANBOL-1014~~, it is possible to generate an EntityHub site for the Freebase Knowledge Base. As part of Google Summer of Code call for 2013, there has been a proposal for Freebase Entity Disambiguation. Proposal details can be found in the following link: http://www.google-melange.com/gsoc/project/google/gsoc2013/adperezmorales/10001. The disambiguation process for Freebase should also follow the workflow and architecture stablished at STANBOL-1037.

The project development has been divided in three global tasks:

1. Integration of resources for local disambiguation. Wikilinks (http://www.iesl.cs.umass.edu/data/wiki-links) is a dataset that provides URLs of webpages, along with the anchor of the links, and the Wikipedia and Freebase pages they link to. As provided, this dataset can be used to get all the surface strings that refer to a Wikipedia page, but further, it can be used to download the webpages and extract the context around the webpages. This contexts can be used for local disambiguation against Content Items mention contexts.

2. Integration of resources for global disambiguation: Freebase is an enormous graphs of related entities and concepts. The structure of this graph can be used to compute groups of entities that are semantically related in a document. For example, we can use the relationship between Michael Jordan and NBA to disambiguate Michael Jordan in a text. The goal of this task is to store the Freebase graph structure in a Neo4j database and provide an API to use it for disambiguation purposes.

3. Disambiguation algorithm: finally, it is necessary to write an algorithm that take into account the local and global disambiguations score in order to refine the confidence values of the EntityAnnotations in the Enhancement Structure