Uploaded image for project: 'Stanbol'
  1. Stanbol
  2. STANBOL-1156

Freebase Entity Disambiguation

    XMLWordPrintableJSON

    Details

      Description

      Since STANBOL-1014, it is possible to generate an EntityHub site for the Freebase Knowledge Base. As part of Google Summer of Code call for 2013, there has been a proposal for Freebase Entity Disambiguation. Proposal details can be found in the following link: http://www.google-melange.com/gsoc/project/google/gsoc2013/adperezmorales/10001. The disambiguation process for Freebase should also follow the workflow and architecture stablished at STANBOL-1037.

      The project development has been divided in three global tasks:

      1. Integration of resources for local disambiguation. Wikilinks (http://www.iesl.cs.umass.edu/data/wiki-links) is a dataset that provides URLs of webpages, along with the anchor of the links, and the Wikipedia and Freebase pages they link to. As provided, this dataset can be used to get all the surface strings that refer to a Wikipedia page, but further, it can be used to download the webpages and extract the context around the webpages. This contexts can be used for local disambiguation against Content Items mention contexts.

      2. Integration of resources for global disambiguation: Freebase is an enormous graphs of related entities and concepts. The structure of this graph can be used to compute groups of entities that are semantically related in a document. For example, we can use the relationship between Michael Jordan and NBA to disambiguate Michael Jordan in a text. The goal of this task is to store the Freebase graph structure in a Neo4j database and provide an API to use it for disambiguation purposes.

      3. Disambiguation algorithm: finally, it is necessary to write an algorithm that take into account the local and global disambiguations score in order to refine the confidence values of the EntityAnnotations in the Enhancement Structure

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rafaharo Rafa Haro
              • Votes:
                2 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 1,344h
                  1,344h
                  Remaining:
                  Remaining Estimate - 1,344h
                  1,344h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified