Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-1552

Concurrent conflicting property creation sometimes doesn't fail

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: core 1.4.2
    • Fix Version/s: None
    • Component/s: jackrabbit-core
    • Labels:
      None

      Description

      The following test prints "Success":

      Session s1 = ...
      Session s2 = ...
      s1.getRootNode().setProperty("b", "0"); // init with zero
      s1.getRootNode().setProperty("b", (String) null); // delete
      s1.save();
      s1.getRootNode().setProperty("b", "1");
      s2.getRootNode().setProperty("b", "2");
      s1.save();
      s2.save();
      System.out.println("Success");

      However if the line marked "... // delete" is commented out,
      it fails with the following exception:

      javax.jcr.InvalidItemStateException:
      cafebabe-cafe-babe-cafe-babecafebabe/{}b: the item cannot be saved
      because it has been modified externally.
      at org.apache.jackrabbit.core.ItemImpl.getTransientStates(ItemImpl.java:246)
      at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:928)
      at org.apache.jackrabbit.core.SessionImpl.save(SessionImpl.java:849)

      It should fail in all cases. If we decide it shouldn't fail, it needs to be documented.

        Activity

        Hide
        Stefan Guggisberg added a comment -

        the following code (2 sessions creating new properties of same name, using interleaved save calls) should trigger an InvalidItemStateException:

        String newPropName = Long.toString(System.currentTimeMillis());
        s1.getRootNode().setProperty(newPropName, "1");
        s2.getRootNode().setProperty(newPropName, "2");
        s1.save();
        s2.save(); // => should throw InvalidItemStateException

        however, currently it succeeds which seems to be a regression.

        Show
        Stefan Guggisberg added a comment - the following code (2 sessions creating new properties of same name, using interleaved save calls) should trigger an InvalidItemStateException: String newPropName = Long.toString(System.currentTimeMillis()); s1.getRootNode().setProperty(newPropName, "1"); s2.getRootNode().setProperty(newPropName, "2"); s1.save(); s2.save(); // => should throw InvalidItemStateException however, currently it succeeds which seems to be a regression.
        Hide
        Jukka Zitting added a comment -

        I don't understand why any of these examples should fail with an InvalidItemStateException. It would make sense if we supported higher isolation levels where already getProperty() would "freeze" the property state, but since we don't do that (and I think it's good that we don't) I fail to see the benefit of this.

        Show
        Jukka Zitting added a comment - I don't understand why any of these examples should fail with an InvalidItemStateException. It would make sense if we supported higher isolation levels where already getProperty() would "freeze" the property state, but since we don't do that (and I think it's good that we don't) I fail to see the benefit of this.
        Hide
        Alexander Klimetschek added a comment -

        Well, the benefit would be that the client code knows its change is based on obsolete node/property data, because another session has overridden it in the meantime. But this benefit is debatable.

        IMHO it is a good idea to do so, but it needs to be documented. (Generally, there should be some documentation describing how Jackrabbit works where the spec is open or where Jackrabbit adds additional features).

        Just to add facts, here is the part from the spec about this (7.1.3.3):

        An InvalidItemStateException may be thrown on a write method of an Item
        if the change being made would (upon save) conflict with a change made,
        and persisted, through another Session. If detection of the conflict is only
        possible at save-time, then save will throw an InvalidItemStateException.

        This leaves it open for the implementation to throw an exception in this case.

        Show
        Alexander Klimetschek added a comment - Well, the benefit would be that the client code knows its change is based on obsolete node/property data, because another session has overridden it in the meantime. But this benefit is debatable. IMHO it is a good idea to do so, but it needs to be documented. (Generally, there should be some documentation describing how Jackrabbit works where the spec is open or where Jackrabbit adds additional features). Just to add facts, here is the part from the spec about this (7.1.3.3): An InvalidItemStateException may be thrown on a write method of an Item if the change being made would (upon save) conflict with a change made, and persisted, through another Session. If detection of the conflict is only possible at save-time, then save will throw an InvalidItemStateException. This leaves it open for the implementation to throw an exception in this case.
        Hide
        Jukka Zitting added a comment -

        > Well, the benefit would be that the client code knows its change is based on obsolete
        > node/property data, because another session has overridden it in the meantime.

        The "based on obsolete data" part would only apply if the client did a getProperty() before setProperty(), but none of the examples here do that.

        Show
        Jukka Zitting added a comment - > Well, the benefit would be that the client code knows its change is based on obsolete > node/property data, because another session has overridden it in the meantime. The "based on obsolete data" part would only apply if the client did a getProperty() before setProperty(), but none of the examples here do that.
        Hide
        Julian Reschke added a comment -

        The example that I started with did that.

        Show
        Julian Reschke added a comment - The example that I started with did that.
        Hide
        Jukka Zitting added a comment -

        > The example that I started with did that.

        AFAIK that's outside the scope of this issue. We've never supported such isolation levels and AFAIUI there has never been an intention to do that. InvalidItemStateException is meant for cases where an intervening save makes the underlying content incompatible with the changes that the current session is trying to save.

        Show
        Jukka Zitting added a comment - > The example that I started with did that. AFAIK that's outside the scope of this issue. We've never supported such isolation levels and AFAIUI there has never been an intention to do that. InvalidItemStateException is meant for cases where an intervening save makes the underlying content incompatible with the changes that the current session is trying to save.
        Hide
        Julian Reschke added a comment -

        > AFAIK that's outside the scope of this issue. We've never supported such isolation levels and AFAIUI there has never been an intention to do that. InvalidItemStateException is meant for cases where an intervening save makes the underlying content incompatible with the changes that the current session is trying to save.

        I do agree that if this isn't about a regression, then it probably should be considered a design discussion.

        That being said: your interpretation makes this feature less useful to clients (IMHO). From the client's point of view, it should be irrelevant how an overlapping update occurred. When the client gets a property value, modifies it, and can write it back although the underlying property has changed, then that is an overlapping update that wasn't catched.

        Show
        Julian Reschke added a comment - > AFAIK that's outside the scope of this issue. We've never supported such isolation levels and AFAIUI there has never been an intention to do that. InvalidItemStateException is meant for cases where an intervening save makes the underlying content incompatible with the changes that the current session is trying to save. I do agree that if this isn't about a regression, then it probably should be considered a design discussion. That being said: your interpretation makes this feature less useful to clients (IMHO). From the client's point of view, it should be irrelevant how an overlapping update occurred. When the client gets a property value, modifies it, and can write it back although the underlying property has changed, then that is an overlapping update that wasn't catched.
        Hide
        Jukka Zitting added a comment -

        > That being said: your interpretation makes this feature less useful to clients (IMHO).
        > From the client's point of view, it should be irrelevant how an overlapping update occurred.
        > When the client gets a property value, modifies it, and can write it back although the underlying
        > property has changed, then that is an overlapping update that wasn't catched.

        That's again assuming that the "get property" operation is included in the control flow. We basically have two separate issues here:

        1) The getProperty(), setProperty(), save(), case. This is equivalent to a database client doing a SELECT followed by an UPDATE on the same row. A database that supports isolation levels REPEATABLE READ or SERIALIZED will guarantee that if the transaction succeeds no other transaction can have updated the row between the SELECT and UPDATE statements. Jackrabbit has never supported such isolation levels and thus the lack of this isn't a regression. We can discuss implementing higher isolation levels as a new feature request, but note that the feature a) has a high design and runtime cost, b) is not needed by many (most?) clients, and c) there's already a standard solution (JCR locks) for clients that do need the functionality. In any case this is IMHO outside the scope of this issue.

        2) The setProperty(), save() case. This is equivalent to a database client doing a prepareStatement followed by executeUpdate on an UPDATE statement. I still don't see how or why such a client could care about concurrent updates (except if the parent node gets removed), and thus the fact that we no longer throw exceptions for some such cases is IMHO rather an improvement than a regression. Based on this reasoning I propose that we resolve this issue as Won't Fix and perhaps create a new improvement issue to get rid of the remaining InvalidItemStateExceptions from concurrent property updates.

        Show
        Jukka Zitting added a comment - > That being said: your interpretation makes this feature less useful to clients (IMHO). > From the client's point of view, it should be irrelevant how an overlapping update occurred. > When the client gets a property value, modifies it, and can write it back although the underlying > property has changed, then that is an overlapping update that wasn't catched. That's again assuming that the "get property" operation is included in the control flow. We basically have two separate issues here: 1) The getProperty(), setProperty(), save(), case. This is equivalent to a database client doing a SELECT followed by an UPDATE on the same row. A database that supports isolation levels REPEATABLE READ or SERIALIZED will guarantee that if the transaction succeeds no other transaction can have updated the row between the SELECT and UPDATE statements. Jackrabbit has never supported such isolation levels and thus the lack of this isn't a regression. We can discuss implementing higher isolation levels as a new feature request, but note that the feature a) has a high design and runtime cost, b) is not needed by many (most?) clients, and c) there's already a standard solution (JCR locks) for clients that do need the functionality. In any case this is IMHO outside the scope of this issue. 2) The setProperty(), save() case. This is equivalent to a database client doing a prepareStatement followed by executeUpdate on an UPDATE statement. I still don't see how or why such a client could care about concurrent updates (except if the parent node gets removed), and thus the fact that we no longer throw exceptions for some such cases is IMHO rather an improvement than a regression. Based on this reasoning I propose that we resolve this issue as Won't Fix and perhaps create a new improvement issue to get rid of the remaining InvalidItemStateExceptions from concurrent property updates.
        Hide
        Alexander Klimetschek added a comment -

        Ok, then I would favor to

        a) make it consistent (as Jukka said, remove the remaining ISEx thrown) and
        b) document this!

        The documentation is important, since many people intuitively expect a different behaviour. As I suggested, it would be cool to start a Wiki/Website page with a list of all Jackrabbit-specific JCR behaviours.

        Show
        Alexander Klimetschek added a comment - Ok, then I would favor to a) make it consistent (as Jukka said, remove the remaining ISEx thrown) and b) document this! The documentation is important, since many people intuitively expect a different behaviour. As I suggested, it would be cool to start a Wiki/Website page with a list of all Jackrabbit-specific JCR behaviours.
        Hide
        Stefan Guggisberg added a comment -

        > Jukka Zitting commented on JCR-1552:
        > ------------------------------------
        >
        > I don't understand why any of these examples should fail with an InvalidItemStateException. It would make sense if we supported higher isolation levels where already getProperty() would "freeze" the property state, but since we don't do that (and I think it's good that we don't) I fail to see the benefit of this.
        >

        this particular issue is about the special case where 2 sessions are both creating a new property with the same name, using interleaved save calls. session2 in my example is not aware that it actually overwrote an existing property (lost update problem). this is IMO a regression since i am pretty sure it used to throw an InvalidItemStateException in the past.

        this special case can be compared with the scenario where 2 sessions are creating conflicting child nodes (SNS not allowed). in this case the implementation does throw an exception (which is IMO not only correct but also mandated by the spec).

        Show
        Stefan Guggisberg added a comment - > Jukka Zitting commented on JCR-1552 : > ------------------------------------ > > I don't understand why any of these examples should fail with an InvalidItemStateException. It would make sense if we supported higher isolation levels where already getProperty() would "freeze" the property state, but since we don't do that (and I think it's good that we don't) I fail to see the benefit of this. > this particular issue is about the special case where 2 sessions are both creating a new property with the same name, using interleaved save calls. session2 in my example is not aware that it actually overwrote an existing property (lost update problem). this is IMO a regression since i am pretty sure it used to throw an InvalidItemStateException in the past. this special case can be compared with the scenario where 2 sessions are creating conflicting child nodes (SNS not allowed). in this case the implementation does throw an exception (which is IMO not only correct but also mandated by the spec).
        Hide
        Stefan Guggisberg added a comment - - edited

        > AFAIK that's outside the scope of this issue. We've never supported such isolation levels and AFAIUI there has never been an intention to do that. InvalidItemStateException is meant for cases where an intervening save makes the underlying content incompatible with the changes that the current session is trying to save.

        +1, that's my understanding as well

        Show
        Stefan Guggisberg added a comment - - edited > AFAIK that's outside the scope of this issue. We've never supported such isolation levels and AFAIUI there has never been an intention to do that. InvalidItemStateException is meant for cases where an intervening save makes the underlying content incompatible with the changes that the current session is trying to save. +1, that's my understanding as well
        Hide
        Jukka Zitting added a comment -

        Stefan:
        > this particular issue is about the special case where 2 sessions are both creating a new
        > property with the same name, using interleaved save calls. session2 in my example is not
        > aware that it actually overwrote an existing property (lost update problem).

        But there's no reliable way (short of explicitly locking the node) for session2 to know whether the property already existed. The InvalidItemStateException is typically not a reliable indicator of this case as a concurrent thread could just as well already have saved the changes before session2 called setProperty(), making the operation an update instead of a create with session2 being none the wiser.

        > this is IMO a regression since i am pretty sure it used to throw an InvalidItemStateException in the past.

        Again, IMHO it's rather an improvement.

        > this special case can be compared with the scenario where 2 sessions are creating
        > conflicting child nodes (SNS not allowed). in this case the implementation does throw
        > an exception (which is IMO not only correct but also mandated by the spec).

        These cases are IMHO not equivalent. Creating a child node is like a INSERT statement in a database, where as setting a property is more like an UPDATE statement on an existing row. Concurrently creating two child nodes with the same name is like two database clients trying to INSERT a row with the same primary key -> the other one should fail. But concurrently setting a property is more like two clients executing an UPDATE on the same row. Unless there's an isolation level violation there's no reason why both clients shouldn't succeed, even if a column was NULL or either one of the clients sets it to NULL.

        Show
        Jukka Zitting added a comment - Stefan: > this particular issue is about the special case where 2 sessions are both creating a new > property with the same name, using interleaved save calls. session2 in my example is not > aware that it actually overwrote an existing property (lost update problem). But there's no reliable way (short of explicitly locking the node) for session2 to know whether the property already existed. The InvalidItemStateException is typically not a reliable indicator of this case as a concurrent thread could just as well already have saved the changes before session2 called setProperty(), making the operation an update instead of a create with session2 being none the wiser. > this is IMO a regression since i am pretty sure it used to throw an InvalidItemStateException in the past. Again, IMHO it's rather an improvement. > this special case can be compared with the scenario where 2 sessions are creating > conflicting child nodes (SNS not allowed). in this case the implementation does throw > an exception (which is IMO not only correct but also mandated by the spec). These cases are IMHO not equivalent. Creating a child node is like a INSERT statement in a database, where as setting a property is more like an UPDATE statement on an existing row. Concurrently creating two child nodes with the same name is like two database clients trying to INSERT a row with the same primary key -> the other one should fail. But concurrently setting a property is more like two clients executing an UPDATE on the same row. Unless there's an isolation level violation there's no reason why both clients shouldn't succeed, even if a column was NULL or either one of the clients sets it to NULL.
        Hide
        Stefan Guggisberg added a comment -

        > > this special case can be compared with the scenario where 2 sessions are creating
        > > conflicting child nodes (SNS not allowed). in this case the implementation does throw
        > > an exception (which is IMO not only correct but also mandated by the spec).
        >
        > These cases are IMHO not equivalent. Creating a child node is like a INSERT statement in a database, where as setting a property is more like an UPDATE statement on an existing row. Concurrently creating two child nodes with the same name is like two database clients trying to INSERT a row with the same primary key -> the other one should fail. But concurrently setting a property is more like two clients executing an UPDATE on the same row. Unless there's an isolation level violation there's no reason why both clients shouldn't succeed, even if a column was NULL or either one of the clients sets it to NULL.

        hmmm, still not fully convinced but you've got a good point here

        Show
        Stefan Guggisberg added a comment - > > this special case can be compared with the scenario where 2 sessions are creating > > conflicting child nodes (SNS not allowed). in this case the implementation does throw > > an exception (which is IMO not only correct but also mandated by the spec). > > These cases are IMHO not equivalent. Creating a child node is like a INSERT statement in a database, where as setting a property is more like an UPDATE statement on an existing row. Concurrently creating two child nodes with the same name is like two database clients trying to INSERT a row with the same primary key -> the other one should fail. But concurrently setting a property is more like two clients executing an UPDATE on the same row. Unless there's an isolation level violation there's no reason why both clients shouldn't succeed, even if a column was NULL or either one of the clients sets it to NULL. hmmm, still not fully convinced but you've got a good point here
        Hide
        Thomas Mueller added a comment -

        Jackrabbit should either always or never throw an exception. When concurrently updating or adding properties, that is. 'Never throw an exception' seems easier. Also the implementation is faster, that's why I would prefer that.

        If we decide to 'always throw', then the algorithm could be as follows: whenever a session reads, re-reads, transiently modifies, or adds a property, remember the old version (timestamp or data). When calling save, check if the current persistent state matches the old timestamp or data.

        When concurrently updating or adding nodes (so far I was talking about properties only), the same algorithm can be used. One problem is the following use case:

        Node b = a.hasNode ? a.getNode : a.addNode;

        With multiple threads / sessions you end up with same name siblings sometimes. That's probably not what the user wants. This is not an issue for properties as there is no 'addProperty' method. I don't know if there is a solution for nodes without using locks. For SQL databases, one solution is using the MERGE statement (sometimes called UPSERT from INSERT or UPDATE).

        Show
        Thomas Mueller added a comment - Jackrabbit should either always or never throw an exception. When concurrently updating or adding properties, that is. 'Never throw an exception' seems easier. Also the implementation is faster, that's why I would prefer that. If we decide to 'always throw', then the algorithm could be as follows: whenever a session reads, re-reads, transiently modifies, or adds a property, remember the old version (timestamp or data). When calling save, check if the current persistent state matches the old timestamp or data. When concurrently updating or adding nodes (so far I was talking about properties only), the same algorithm can be used. One problem is the following use case: Node b = a.hasNode ? a.getNode : a.addNode ; With multiple threads / sessions you end up with same name siblings sometimes. That's probably not what the user wants. This is not an issue for properties as there is no 'addProperty' method. I don't know if there is a solution for nodes without using locks. For SQL databases, one solution is using the MERGE statement (sometimes called UPSERT from INSERT or UPDATE).
        Hide
        Jukka Zitting added a comment -

        Categorized as an improvement request as I don't think this is a bug (see above). Also, I'd propose to solve this as Won't Fix (see above), as I still don't see how a client that doesn't use getProperty (or hasProperty) would benefit from the exception.

        Show
        Jukka Zitting added a comment - Categorized as an improvement request as I don't think this is a bug (see above). Also, I'd propose to solve this as Won't Fix (see above), as I still don't see how a client that doesn't use getProperty (or hasProperty) would benefit from the exception.
        Hide
        Jukka Zitting added a comment -

        As proposed, resolving as Won't Fix.

        Show
        Jukka Zitting added a comment - As proposed, resolving as Won't Fix.

          People

          • Assignee:
            Stefan Guggisberg
            Reporter:
            Thomas Mueller
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development