Stanbol
  1. Stanbol
  2. STANBOL-426

Ability to manage identifiers of ontologies added to spaces

    Details

    • Type: Story Story
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Ontology Manager
    • Labels:
      None

      Description

      We have a use case which involves synchronising ontologies stored in a KReS custom space with content from an external system, which itself identifies units of content (ie the ontology being added to KReS) using URIs.

      When adding an ontology using an OntologyInputSource there doesn't seem to be any way of reconciling the identity of the ontology in the custom space with the content identifier URI from the external system.

      This means that when content in the external system is modified or deleted, there is no easy way to determine which ontology in the custom space should be updated/deleted.

        Issue Links

          Activity

          Hide
          Alessandro Adamou added a comment - - edited

          Ok so I am setting the following conditions for this ticket to be resolved:

          1. Implement garbage-collection for managed ontologies : Stanbol could be configured to remove a graph from the Clerezza backend when (a) there are no more "handles" to it (scopes, sessions etc.) and (b) the graph was created at the same time as its addition to a scope or session.
          2. A smart policy for handling updates
          3. Full support for OWL2 version IRI (might actually solve 1 and 2 if version IRIs are forged properly)

          Show
          Alessandro Adamou added a comment - - edited Ok so I am setting the following conditions for this ticket to be resolved: 1. Implement garbage-collection for managed ontologies : Stanbol could be configured to remove a graph from the Clerezza backend when (a) there are no more "handles" to it (scopes, sessions etc.) and (b) the graph was created at the same time as its addition to a scope or session. 2. A smart policy for handling updates 3. Full support for OWL2 version IRI (might actually solve 1 and 2 if version IRIs are forged properly)
          Hide
          Stephen Bayliss added a comment -

          I think it would be useful if the ontology management API could some how manage this and have methods that can both deal with content managed within the store and external content (registered in scopes and spaces).

          Ie the ability to both deregister an ontology from a space, and the ability to remove the underlying graph (when deregistering, or perhaps a separate method?).

          And when adding one could extend methods to cover the use-cases of replacing an existing graph, adding a new graph (eg new version), merging with the exiting graph, etc.

          Show
          Stephen Bayliss added a comment - I think it would be useful if the ontology management API could some how manage this and have methods that can both deal with content managed within the store and external content (registered in scopes and spaces). Ie the ability to both deregister an ontology from a space, and the ability to remove the underlying graph (when deregistering, or perhaps a separate method?). And when adding one could extend methods to cover the use-cases of replacing an existing graph, adding a new graph (eg new version), merging with the exiting graph, etc.
          Hide
          Alessandro Adamou added a comment - - edited

          Actually it should be possible to have multiple identifiers for the same graphs, or better, equal graphs which share the same triples.

          I performed a code check: you get two TripleCollection's because the moment you call addOntology() on ontology collectors (spaces or sessions), they always try to create a MGraph named after the ontology itself (if it has a name, otherwise it uses a timestamp - I hope your ontonet:: graph ID looks like that because your ontology did not have an OWL name, did it?).

          Then it performs addAll() from the originally stored graph.

          I would expect this to be equivalent to having a single graph with two names and no double memory usage, since they share the same TcProvider (the unique TcManager).

          You could get yourself a single identifier by using a new SimpleTcProvider() for the GraphContentInputSource, I guess that would cause twice the memory usage, at least until the input source is garbage-collected.

          If you wish, I can implement a check that "deletes" the triple collection org.apache.stanbol.ontologymanager.ontonet.api.io.GraphContentInputSource-1328882620149 . According to the Clerezza API specification, the triples should survive the deletion, because they still belong to ontonet::http://stanbol.apache.org/1328882278969

          Show
          Alessandro Adamou added a comment - - edited Actually it should be possible to have multiple identifiers for the same graphs, or better, equal graphs which share the same triples. I performed a code check: you get two TripleCollection's because the moment you call addOntology() on ontology collectors (spaces or sessions), they always try to create a MGraph named after the ontology itself (if it has a name, otherwise it uses a timestamp - I hope your ontonet:: graph ID looks like that because your ontology did not have an OWL name, did it?). Then it performs addAll() from the originally stored graph. I would expect this to be equivalent to having a single graph with two names and no double memory usage, since they share the same TcProvider (the unique TcManager). You could get yourself a single identifier by using a new SimpleTcProvider() for the GraphContentInputSource, I guess that would cause twice the memory usage, at least until the input source is garbage-collected. If you wish, I can implement a check that "deletes" the triple collection org.apache.stanbol.ontologymanager.ontonet.api.io.GraphContentInputSource-1328882620149 . According to the Clerezza API specification, the triples should survive the deletion, because they still belong to ontonet:: http://stanbol.apache.org/1328882278969
          Hide
          Stephen Bayliss added a comment -

          Testing on svn rev 1241972:

          Using the new GraphContentInputSource, passing tcManager in the constructor, is able to handle our large graphs now.

          Identifiers are now passed back by the methods adding these, which is great.

          However there seems to be a secondary issue that more than one graph is created (or two graph identifiers for the sae graph? is that even possible?). We are seeing a graph with the identifier returned by the method, but also seeing a graph with the identifier of the form:

          org.apache.stanbol.ontologymanager.ontonet.api.io.GraphContentInputSource-1328882620149

          (for reference, the graph ID returned by the method is ontonet::http://stanbol.apache.org/1328882278969)

          Show
          Stephen Bayliss added a comment - Testing on svn rev 1241972: Using the new GraphContentInputSource, passing tcManager in the constructor, is able to handle our large graphs now. Identifiers are now passed back by the methods adding these, which is great. However there seems to be a secondary issue that more than one graph is created (or two graph identifiers for the sae graph? is that even possible?). We are seeing a graph with the identifier returned by the method, but also seeing a graph with the identifier of the form: org.apache.stanbol.ontologymanager.ontonet.api.io.GraphContentInputSource-1328882620149 (for reference, the graph ID returned by the method is ontonet:: http://stanbol.apache.org/1328882278969 )

            People

            • Assignee:
              Unassigned
              Reporter:
              Stephen Bayliss
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development