Stanbol
  1. Stanbol
  2. STANBOL-336

Improve Fetching of Entity Information during the enhancement process

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0, 0.12.1
    • Component/s: Enhancer
    • Labels:
      None

      Description

      Short overview about the current Situation:

      Previously the CachingDereferncingEngine was used do dereference information of Entities referenced in enhancement results. This engine loaded Entity data directly by using there IDs and stored the data in a Clerezza Graph. It was introduced before the Entityhub was implemented and was never updated to use the Entityhub.

      STANBOL-333 removed the CachingDereferncingEngine and added support to dereference Entities via the Entityhub directly to the NamedEntityTaggingEngine.

      However the removal of the CachingDereferncingEngine had the side effect that currently no other engine supports this feature. Engines must now support the fetching of entity data otherwise such data can no longer be provided.
      In cases where multiple Engines support this features users would need to change the configuration for several engines in order to probably active/deactivate this feature. Activation/Deactivation for single requests would not be possible.

      The proposal how to deal with fetching of entity information in future:

      • Move the configuration for the default value to enable/disable dereferencing of Entity information to a central component (probably as a property of the EnhancerWebFragment)
      • Add a possibility that allows users to override the default value for specific requests. This could use an additional parameter or also a HTTP header.
      • Store the entity-dereference-state within the metadata of the Content-Item.
      • If the entity-dereference-state is set to true EnhancementEngines MAY add entity information to the enhancement graph. This is especially useful if an engine does already fetch the required entity information during the enhancement process.
      • To allow users to also fetch entity information that are not added during the enhancement process we might want to reintroduce a DereferenceingEngine that can dereference missing Entities. This engine MUST BE able to use the Entityhub with the option to also dereference Entities via there URI if they are not available via the Entityhub.

      WDYT
      Rupert

        Issue Links

          Activity

          Show
          Rupert Westenthaler added a comment - 0.12 http://svn.apache.org/r1594134 trunk http://svn.apache.org/r1594140 Documentation: http://svn.apache.org/r1597033
          Hide
          Rupert Westenthaler added a comment -

          sorry marked wrong issue as resolved ... reopening

          Show
          Rupert Westenthaler added a comment - sorry marked wrong issue as resolved ... reopening
          Hide
          Rupert Westenthaler added a comment -

          The state if entities should be dereferenced or not could be parsed as Enhancement Property.

          Show
          Rupert Westenthaler added a comment - The state if entities should be dereferenced or not could be parsed as Enhancement Property.
          Hide
          Olivier Grisel added a comment -

          I think we could make an EntityHubDereferencingEngine or a ReferencedSiteDereferenceingEngine base class (not abstract as it could be used in standalone mode) and refactor the NamedEntityTaggingEngine to derive from this base class.

          I would make the ReferencedSiteDereferenceingEngine hold the propriety of whether or not we want to enable automated dereferencing of entities, concepts and topics occurring in the enhancement graph, maybe make it even more configurable by authorizing a list of types of entities to dereference.

          Also +1 for extending WebContentItem to add metadata about the query (HTTP headers and request parameters) with a dedicated parameter that would allow to override this configuration to be overridden on a per request basis.

          Show
          Olivier Grisel added a comment - I think we could make an EntityHubDereferencingEngine or a ReferencedSiteDereferenceingEngine base class (not abstract as it could be used in standalone mode) and refactor the NamedEntityTaggingEngine to derive from this base class. I would make the ReferencedSiteDereferenceingEngine hold the propriety of whether or not we want to enable automated dereferencing of entities, concepts and topics occurring in the enhancement graph, maybe make it even more configurable by authorizing a list of types of entities to dereference. Also +1 for extending WebContentItem to add metadata about the query (HTTP headers and request parameters) with a dedicated parameter that would allow to override this configuration to be overridden on a per request basis.

            People

            • Assignee:
              Rupert Westenthaler
              Reporter:
              Rupert Westenthaler
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development