Uploaded image for project: 'Apache Commons RDF'
  1. Apache Commons RDF
  2. COMMONSRDF-8

simple .GraphImpl.add() must clone BlankNode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.1.0
    • None

    Description

      From https://github.com/commons-rdf/commons-rdf/issues/66

      stain:

      In simple GraphImpl add() it does not clone BlankNode.
      https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/GraphImpl.java#L44

      Lacking #63 or #64, this means that this would wrongly merge two BlankNodes from different scopes - say you iterate over triples from graph 1 and add them to graph 2. This test-case should be added - and also to verify that after copying over, two statements about g1.b1 use the same BlankNode, and two about g2.b1 uses the same - but that those two blank nodes are themselves different - e.g. must now have different internal identifiers.

      Perhaps a hashing based on Objects.identityHashCode of the original localScope (if a BlankNodeImpl which exposes that) or the original BlankNode is sufficient - although it could lead to two 'equals()' BlankNodes from g1 that are not the same object to not be merged. Hence this is just an intermediary solution to #48 .

      https://github.com/commons-rdf/commons-rdf/pull/63
      https://github.com/commons-rdf/commons-rdf/pull/64
      https://github.com/commons-rdf/commons-rdf/pull/48

      afs:

      What is "merge"?

      "Merge" is logical(semantic) operation. The API works on abstract syntax. Higher level code needs to deal with semantics. Otherwise, "simple" is taking a stance of one usage over another.

      Blank nodes can be shared between graphs. Sharing means .equals. If it is "same implementation" then using

      If the system implemenation blank node identifier (Concepts, sec 3.4) is choosen to be a global unique id, no copy is needed across the same implementation. That makes the statement "must now have different internal identifiers" too strong.

      Because the API abstracts the key features of an RDFTerm, "simply" can work with java objects created in other implementations which would eb good as the de facto reference example.

      stain:

      Yes, agree that if you intend for them to be shared, that should be all good and fine - but then it should truly be merged as well - that is be both .equals(), have same .hashCode() and same (internal)Identifier and nTriplesString.

      Lacking globally unique identifiers on BlankNode - GraphImpl.add() will have to do something else.

      But now the Simple implementation will let you create two BlankNodes in two Graphs g1 and g2 with the same "internal identifier" , e.g. "b1" - and then you can add statements using them to a single Graph g3. (for instance because you are just adding all triples from g1 and g2 without any modifications).

      In that case you get a strange case where they have the same internalIdentifier "b1" and same ntriples string - but they internally keep two different "local scopes" back to g1 and g2, and thus their .equals() will claim they are not equal.

      This is in contradiction to the javadoc for BlankNode which says:
      https://github.com/commons-rdf/commons-rdf/blob/master/api/src/main/java/com/github/commonsrdf/api/BlankNode.java#L44

      > Implementations that handle blank node identifiers in concrete syntaxes need to be careful not to create the same blank node from multiple occurrences of the same blank node identifier except in situations where this i supported by the syntax.

      So it should be either or - and the test need to verify which one of them it is.

      That is what this issue will have to sort out for the upcoming release - assuming that something like #63 or #64 won't make it in, as they would avoid this issue to a large extent.

      afs:

      Cloning on GraphImpl.add will require long term internal state. What if the same blank node is added again later? Not even "merge" helps with this; a sequence of adds is all one merge persumably. RDF 1.1 MT

      There are standard, cheap mechanisms for globally unique identifiers. They provide the easiest and clearest solution for the generation of fresh blank nodes that do not clash and will not in the future by accident. If "simple" wishes to impose further contract restrictions, like no shared blank nodes, then it can deal with system identifers per graph but no shared blank nodes is a significant restriction.

      If it is the same "internal identifier", maybe that because they are supposed to be the same. That can happen for example, reconstructing from data on disk (not RDF systems, more database-like storage).

      The N-triples string, if the contract is that it can be used in output unchanged, must be some long, unique form. Even if it could, the signficance is lost on reading: when read in again, the short name can not be used on its own (consider reading the same file twice). An internal identifier which is a unique poer parser run together with the seen label can be used - the pair is unique.

      Attachments

        Issue Links

          Activity

            People

              stain Stian Soiland-Reyes
              soilandreyes Stian Soiland-Reyes (old) (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: