Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Done
    • None
    • None
    • Contenthub
    • None

    Description

      Simple Storage interface for enhanced ContentItems.

      This Store is used to

      1. save the ContentItems after they are enhanced by the Enahncer

      • The Blobs (original content and transcoded versions)
      • The Metadata (Enhancement Results)
        2. retrieve ContentItems while semantic indexing
      • Iterator over the IDs
      • Get ContentItem by ID

      This store is NOT intended to be used for search! It is only used for ID based lookup.

      Implementations:
      -----------------------

      • CMS Adapter: An implementation based on the CMS Adapter provides the possibility to store the Enhancement Results directly within the CMS. Typically this will be the CMS also sending the request to the Contenthub, but this is no requirement.
      • Clerezza based implementation: Clerezza - as RDF based CMS - provides the required functionality to store both the content AND the metadata of the contentItem
      • File based: Simple file based storage without any external dependencies. This could be used as default and for testing

      Interface:
      -------------

      The interface will be based-on/replace the [Store](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/Store.java) interface already present in the Contenthub. However the suggestion is to remove the "getEnhancementGraph()" as this is not required by the usecases (1) and (2) mentioned above. In addition the store interface should be extended with a remove method to allow manual deletion of ContentItems.

      /** stores the parsed ContentItem */
      + put(ContentItem ci) : UriRef
      /** Getter for the ContentItem with the parsed ID */
      + get(UriRef id) : ContentItem

          1. Revisions

      Revisions are used to re-synchronize semantic indexes with the enhanced ContentItems managed by this store. Every time the ContentHub indexes enhanced ContentItem - as managed by this store - to a SemanticIndex it provides the highest revision. SemanticIndexes MUST persist such revisions and MUST ensure they are even available after a re-start because this number will be later used by the ContentHub to apply changes to enhances ContentItmes.

      In detail a revision is defined as a change (add, update, removal) to one or more ContentItems managed by the Store. Every such change MUST BE result in an increase of the revision. Revisions MUST only use positive numbers. Implementers might use <code>System.currentTimeMillis()</code> as revision but this is no requirement.

      The store interface provides a method that returns an Iterator over all changed ContentItems that where changed (added, updated, removed) since a given revision.

      /** Iterator over all contentItems added/removed after revision */
      + changes(long revision, int offset, int batchSize) : ChangeSet

      class ChangeSet

      { /** the lowest included revision */ + from() : long /** the id of changed ContentItems */ + changed() : Map<UriRef> /** the highest included revision */ + to() : long }

      Calls to chages(..) MUST return only changes with a higher revision as the provided number. ChangeSet with the parsed revision number MUST BE excluded. Note that ChangeSet does not provide information about the type of the change. This will be only available after a call to Store#get(..).

      The revisions MUST NOT to keep a history of changes. Only the revision of the latest change MUST be kept. This ensures that rebuilding a semantic index (from revsion -1) does only perform indexing steps corresponding to historical state of the Store. Note also that the revisions do not provide information about the type of the change. If a ContentItem is still present (added, updated) or was removed will be indicated by the get(..) method of the store returning a ContentItem instance or <code>null</code>

            1. Example:

      e.g. if first the contentItem 1,2 and 3 are added, later content item 2 is updated and 3 is deleted and in a third step contentitem 3 and 4 are added this would result in the following revision data

      After step 1:

      :::text
      1 : urn:contentItem.1 //added
      1 : urn:contentItem.2 //added
      1 : urn:contentItem.3 //added

      After step 2:

      :::text
      1 : urn:contentItem.1 //added
      2 : urn:contentItem.2 //updated
      2 : urn:contentItem.3 //removed

      After step 3:

      :::text
      1 : urn:contentItem.1 //added
      2 : urn:contentItem.2 //updated
      3 : urn:contentItem.3 //added
      3 : urn:contentItem.4 //added

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rwesten Rupert Westenthaler
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment