Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Module providing an API for implementing disambiguation enhancement engines.

      Attachments

        Issue Links

          Activity

            rwesten Rupert Westenthaler added a comment - - edited
            1. Entity Disambiguation Context

            Describes the disambiguation context known for an Entity. An `EntityContext` may have several `ContextField<VT>`. Those fields may represent different types of contexts (e.g. spacial, social, ...). An EntityContext defines values of type `<VT>` for a given `ContextField`. Optionally those values can be `Weighted`.

                EntityContext
                    uri :: String // the URI of the entity
                    fields :: Set<ContextField<VT>> //fields present in the context
                    values(ContextField<VT> field) :: Set<VT>
            
                ContextField<VT>
                    uri :: String // the property of the field
                    valueType :: Class<VT> // the generic type of values
                    weighted :: boolean  //if VT implements Weighted
                    <<additional metadata>>
            
                Weighted /* used for values with weight */
                    weight :: double //the weight
            

            The value MUST BE available via the "toString() :: String" method of the `ContextField#valueType` property. This will allow valueType agnostic implementations. For simple implementations `java.lang.String` can be used as `valueType`.

            1. EntityContextProvider

            This service will provide `EntityContext` information for Entities

                EntityContextProvider
                    boolean knowsEntity(Sugestion suggestion)
                    EntityContext getContext(Sugestion suggestion)
            

            EntityContextProvider will also need to have some additional OSGI properties to allow engines to filter for them (e.g. a name). Properties might also be implementation specific (e.g. the Entityhub Site name ...)

            1. Possible Implementations

            1. Entityhub based

            • Entityhub Site based implementation of the `EntityContextProvider`
            • Entityhub Representation based implementation of the `EntityContext`
            • No support for `Weighted`possible
            • Entityhub Indexing Tool can be used to generate contexts. Especially LDPath can be very useful for the creation of those)
            • Also Yards backed by RDF triple stores could be used (Clerezza, Sesame)

            2. Lucene/Solr based

            • `EntityContextProvider` based on simple get requests. Provider needs to be configured with the ContextField#uri to Solr field name mappings.
            • SolrDocument based implementation of `EntityContext`
            • No support for `Weighted`possible

            3. Clerezza based

            • `EntityContextProvider based on TripleCollection
            • `EntityContext` based on GraphNode

            4. Sesame based
            5. Jena based

            6. Blueprint/Tinkerpop based

            • as this uses a property graph it could even support `Weighted`
            • adperezmorales also used Tinkerpop for his Disambiguation engine (STANBOL-1156)
            rwesten Rupert Westenthaler added a comment - - edited Entity Disambiguation Context Describes the disambiguation context known for an Entity. An `EntityContext` may have several `ContextField<VT>`. Those fields may represent different types of contexts (e.g. spacial, social, ...). An EntityContext defines values of type `<VT>` for a given `ContextField`. Optionally those values can be `Weighted`. EntityContext uri :: String // the URI of the entity fields :: Set<ContextField<VT>> //fields present in the context values(ContextField<VT> field) :: Set<VT> ContextField<VT> uri :: String // the property of the field valueType :: Class <VT> // the generic type of values weighted :: boolean // if VT implements Weighted <<additional metadata>> Weighted /* used for values with weight */ weight :: double //the weight The value MUST BE available via the "toString() :: String" method of the `ContextField#valueType` property. This will allow valueType agnostic implementations. For simple implementations `java.lang.String` can be used as `valueType`. EntityContextProvider This service will provide `EntityContext` information for Entities EntityContextProvider boolean knowsEntity(Sugestion suggestion) EntityContext getContext(Sugestion suggestion) EntityContextProvider will also need to have some additional OSGI properties to allow engines to filter for them (e.g. a name). Properties might also be implementation specific (e.g. the Entityhub Site name ...) Possible Implementations 1. Entityhub based Entityhub Site based implementation of the `EntityContextProvider` Entityhub Representation based implementation of the `EntityContext` No support for `Weighted`possible Entityhub Indexing Tool can be used to generate contexts. Especially LDPath can be very useful for the creation of those) Also Yards backed by RDF triple stores could be used (Clerezza, Sesame) 2. Lucene/Solr based `EntityContextProvider` based on simple get requests. Provider needs to be configured with the ContextField#uri to Solr field name mappings. SolrDocument based implementation of `EntityContext` No support for `Weighted`possible 3. Clerezza based `EntityContextProvider based on TripleCollection `EntityContext` based on GraphNode 4. Sesame based 5. Jena based 6. Blueprint/Tinkerpop based as this uses a property graph it could even support `Weighted` adperezmorales also used Tinkerpop for his Disambiguation engine ( STANBOL-1156 )
            rwesten Rupert Westenthaler added a comment - - edited
            1. Disambiguation Data

            Provides information extracted from the parsed content in a form that makes it easily consumable by a Disambiguation engine. While the `DisambiguationData` class will provide information about the whole content item there will be also a `DisambiguationContext` class that acts as a filter over those data based a given location within the content (see next section).

                DisambiguationData
                    extractedEntities :: Set<ExtractedEntity>
                    disambiguations :: Set<Disambiguation>
            
              1. Extracted Entity

            This wraps a `fise:TextAnnotation` with several suggested fise:EntityAnnotation`.

                ExtractedEntity
                    mention :: String //fise:selected-text
                    start :: int //fise:start
                    end :: int //fise:end
                    suggestions :: Set<Suggestion>
            
                Suggestion
                    uri :: String //fise:entity-reference
                    types :: Set<String> //fise:entity-type
                    label :: String //fise:entity-label
                    site :: String //entityhub:site (optional)
                    confidence :: double //fise:confidence
            
              1. Disambiguation

            Represents a disambiguation result for a `Suggestion` of an `ExtractedEntity`.

                Disambiguation
                    confidence :: double
                    extractedEntity :: ExtractedEntity
                    suggestion :: Suggestion
            
            rwesten Rupert Westenthaler added a comment - - edited Disambiguation Data Provides information extracted from the parsed content in a form that makes it easily consumable by a Disambiguation engine. While the `DisambiguationData` class will provide information about the whole content item there will be also a `DisambiguationContext` class that acts as a filter over those data based a given location within the content (see next section). DisambiguationData extractedEntities :: Set<ExtractedEntity> disambiguations :: Set<Disambiguation> Extracted Entity This wraps a `fise:TextAnnotation` with several suggested fise:EntityAnnotation`. ExtractedEntity mention :: String //fise:selected-text start :: int //fise:start end :: int //fise:end suggestions :: Set<Suggestion> Suggestion uri :: String //fise:entity-reference types :: Set< String > //fise:entity-type label :: String //fise:entity-label site :: String //entityhub:site (optional) confidence :: double //fise:confidence Disambiguation Represents a disambiguation result for a `Suggestion` of an `ExtractedEntity`. Disambiguation confidence :: double extractedEntity :: ExtractedEntity suggestion :: Suggestion
            rwesten Rupert Westenthaler added a comment - - edited
            1. DisambiguationContext

            The Context used for the disambiguation of the content item - or a part of the content item. The `DisambiguationContext` will be created by a factory based on a `DisambiguationData` object.

              1. DisambiguationContextFactory

            This factory is responsible to create a disambiguation context for the parsed arguments. Arguments could be a Paragraph (span), Sentence or simple a position within the content item.

                DisambiguationContextFactory
                    createContext(??) :: DisambiguationContext
            
              1. DisambiguationContext Implementations

            Different factory implementation will provide different kind of contexts

            • full document context
            • sliding window context
            • section context: everything within the same paragraph; within three sentences ...
            • global concept context: provide selected annotations independent of the current active section (e.g. already disambiguated entities; categorizations; user provided tags ...)
            • union context: provides a union view over two other contexts (e.g. sliding window and global concepts contexts)

            The Idea of the `DisambiguationContext` is to be a view over the `DisambiguationData`. Therefore it should use the same API (at least both should have a common super interface).

            rwesten Rupert Westenthaler added a comment - - edited DisambiguationContext The Context used for the disambiguation of the content item - or a part of the content item. The `DisambiguationContext` will be created by a factory based on a `DisambiguationData` object. DisambiguationContextFactory This factory is responsible to create a disambiguation context for the parsed arguments. Arguments could be a Paragraph (span), Sentence or simple a position within the content item. DisambiguationContextFactory createContext(??) :: DisambiguationContext DisambiguationContext Implementations Different factory implementation will provide different kind of contexts full document context sliding window context section context: everything within the same paragraph; within three sentences ... global concept context: provide selected annotations independent of the current active section (e.g. already disambiguated entities; categorizations; user provided tags ...) union context: provides a union view over two other contexts (e.g. sliding window and global concepts contexts) The Idea of the `DisambiguationContext` is to be a view over the `DisambiguationData`. Therefore it should use the same API (at least both should have a common super interface).
            1. (Suggested) Disambiguation workflow

            This describes how a Disambiguation Enhancement Engine can use this API:

            1. create a `DisambiguationData` instance for the ContentItem
            2. the engine needs to decide how to iterate over the ContentItem: Typically it will
            a. get the first `ExtractedEntity` and
            b. create a `DisambiguationContext` for it and
            c. disambiguate it with all the other `ExtractedEntities` in the same context
            3. perform the disambiguation
            a. as required by the algorithm `EntityContext` for `Suggestion` of `ExtractedEntities` can be loaded via the configured `EntityContextProvider(s)`.
            b. disambiguation results are represented as `Disambiguation` instances and added to the context. The engine might use its own `Disambiguation` to store additional intermediate results.
            c. previous disambiguation results are also accessible via the `DisambiguationContext`.
            4. when the Disambiguation process has finished the `DisambiguationData` can be used to write the results back to the Enhancement Structure.

            rwesten Rupert Westenthaler added a comment - (Suggested) Disambiguation workflow This describes how a Disambiguation Enhancement Engine can use this API: 1. create a `DisambiguationData` instance for the ContentItem 2. the engine needs to decide how to iterate over the ContentItem: Typically it will a. get the first `ExtractedEntity` and b. create a `DisambiguationContext` for it and c. disambiguate it with all the other `ExtractedEntities` in the same context 3. perform the disambiguation a. as required by the algorithm `EntityContext` for `Suggestion` of `ExtractedEntities` can be loaded via the configured `EntityContextProvider(s)`. b. disambiguation results are represented as `Disambiguation` instances and added to the context. The engine might use its own `Disambiguation` to store additional intermediate results. c. previous disambiguation results are also accessible via the `DisambiguationContext`. 4. when the Disambiguation process has finished the `DisambiguationData` can be used to write the results back to the Enhancement Structure.

            People

              Unassigned Unassigned
              rwesten Rupert Westenthaler
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: