Apache Jena
  1. Apache Jena
  2. JENA-100

QueryIteratorBase concurrency issues

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: ARQ
    • Labels:
      None

      Description

      QueryIteratorBase appears to have some concurrency bugs relating to cancelling a query:

      1) The cancel() and cancelAllowContinue() methods did not have large enough synchronized blocks (they also used "this" as the lock, which is not recommended)
      2) The order of setting the cancellation flags and the notifying subclasses via the requestCancel() method was incorrect
      3) The visibility and happens-before relationships were incorrect for the requestingCancel and abortIterator variables

      The cancelAllowContinue() feature adds a lot of complexity in terms of visibility and ordering. Unfortunately it is hard to write test cases for these types of concurrency issues, so the existing tests did not uncover the issues.

        Activity

        Stephen Allen created issue -
        Stephen Allen made changes -
        Field Original Value New Value
        Attachment JENA-100-r1157891.patch [ 12490474 ]
        Hide
        Stephen Allen added a comment -

        One of the potential issues caused by 3) is in the nextBinding() method on line 127:

        It could be that a regular cancel was requested as we were midway through the method (after checking the abortIterator, but before checking requestCancel). This would cause the iterator to close and appear to terminate normally (but without all of the results) instead of throwing a QueryCancelledException.

        Show
        Stephen Allen added a comment - One of the potential issues caused by 3) is in the nextBinding() method on line 127: It could be that a regular cancel was requested as we were midway through the method (after checking the abortIterator, but before checking requestCancel). This would cause the iterator to close and appear to terminate normally (but without all of the results) instead of throwing a QueryCancelledException.
        Hide
        Simon Helsen added a comment -

        thanks Stephen. In https://issues.apache.org/jira/browse/JENA-93 I noticed that cancelAllowContinue sometimes throws an exception which it shouldn't. There may be a relation to what you found here. Once your patch makes it into a build, I'll test if it resolves the problem I observed.

        re: complexity. Yes, I know because in the patch I original submitted for this feature, I had to introduce a multi-staged cancellation mechanism to gracefully allow the drain in a multi-threaded environment. It went something like this: when a cancel() is invoked, notify the iterator stack the cancel has been requested, but do not cancel just yet Instead, all iterators have to be able to continue until the first execution of next() finishes. It is not sufficient to have hasNext() return false immediately because in a multi-threaded world, you may end up in a situation where a call to hasNext succeeded, but next() was not yet executed (and it usually calls hasNext under the hood again). So, after the first next() was retrieved, the cancel becomes "active" (stage 2) and hasNext() will start returning false.

        This 2-staged cancellation was applied to ARQ in our released product and works very reliably. I hope cancelAllowContinue() can be implemented in a similar manner. It would hinder us from upgrading to a later ARQ/TDB

        Show
        Simon Helsen added a comment - thanks Stephen. In https://issues.apache.org/jira/browse/JENA-93 I noticed that cancelAllowContinue sometimes throws an exception which it shouldn't. There may be a relation to what you found here. Once your patch makes it into a build, I'll test if it resolves the problem I observed. re: complexity. Yes, I know because in the patch I original submitted for this feature, I had to introduce a multi-staged cancellation mechanism to gracefully allow the drain in a multi-threaded environment. It went something like this: when a cancel() is invoked, notify the iterator stack the cancel has been requested, but do not cancel just yet Instead, all iterators have to be able to continue until the first execution of next() finishes. It is not sufficient to have hasNext() return false immediately because in a multi-threaded world, you may end up in a situation where a call to hasNext succeeded, but next() was not yet executed (and it usually calls hasNext under the hood again). So, after the first next() was retrieved, the cancel becomes "active" (stage 2) and hasNext() will start returning false. This 2-staged cancellation was applied to ARQ in our released product and works very reliably. I hope cancelAllowContinue() can be implemented in a similar manner. It would hinder us from upgrading to a later ARQ/TDB
        Hide
        Andy Seaborne added a comment -

        Patch applied and a build done, thanks

        Simon - please provide test cases for your use case.

        Show
        Andy Seaborne added a comment - Patch applied and a build done, thanks Simon - please provide test cases for your use case.
        Andy Seaborne made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Andy Seaborne [ andy.seaborne ]
        Resolution Fixed [ 1 ]
        Andy Seaborne made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Andy Seaborne
            Reporter:
            Stephen Allen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development