Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: Jena 2.11.0
    • Component/s: ARQ, TDB
    • Labels:
      None

      Description

      The requested improvement and proposed patch is made by Simon Helsen on behalf of IBM

      ARQ query execution currently does not have a satisfactory way to cancel a running query in a safe way. Moreover, cancel (unlike a hard abort) is especially useful if it is able to provide partial result sets (i.e. all the results it managed to compute up to when the cancellation was requested). Although the exact cancellation behavior depends on the capabilities of the underlying triple store, the proposed patch merely relies on the iterators used by ARQ.

      Here is a more detailed explanation of the proposed changes:

      1) the cancel() method in the QueryIterator initiates a cancellation request (first boolean flag). In analogy with closeIterator(), it propagates through all chained iterators, so the entire calculation is aware that a cancellation is requested
      2) to ensure a thread-safe semantics, the cancelRequest becomes a real cancel once nextBinding() has been called. It sets the second boolean which is used in hasNext(). This 2-phase approach is critical since the cancel() method can be called at any time during a query execution by the external thread. And because the behavior of hasNext() is such that it has to return the same value until next() is called, this is the only way to guarantee semantic safety when cancel() is invoked (let me re-phrase this: it is the only way I was able to make it actually work)
      3) cancel() does not close anything since it allows execution to finish normally and the client is responsible to call close() just like with a regular execution. Note that the client has to call cancel() explicitly (typically in another thread) and has to assume that the returning result set may be incomplete if this method is called (it is undetermined whether the result is actually incomplete)
      4) in order to deal with order-by and groups, I had to make two more changes. First, I had to make QueryIterSort and QueryIterGroup a slightly bit more lazy. Currently, the full result set is calculated during plan calculation. With my proposed adjustments, this full result set is called on the first call to any of its Iterator methods (e.g. hasNext). This change does not AFAIK affect the semantics. Second, because the desired behavior of cancelling a sort or group query is to make sure everything is sorted/grouped even if the total result set is not completed, I added an exception which reverses the cancellation request of the encompassing iterator (as an example see cancel() in QueryIterSort). This makes sure that the entire subset of found and sorted elements is returned, not just the first element. However, it also implies in the case of sort that when a query is cancelled, it will first sort the partially complete result set before returning to the client.

      the attached patch is based on ARQ 2.8.5 (and a few classes in TDB 0.8.7 -> possibly the other triple store implementations need adjustement as well)

      1. jena.patch
        53 kB
        Simon Helsen
      2. jenaAddition.patch
        2 kB
        Simon Helsen
      3. JENA-29_ARQ_r8489.patch
        36 kB
        Paolo Castagna
      4. JENA-29_TDB_r8489.patch
        1 kB
        Paolo Castagna
      5. JENA-29_tests_ARQ_r8489.patch
        6 kB
        Paolo Castagna
      6. queryIterRepeatApply.patch
        2 kB
        Simon Helsen
      7. cancelFix3.patch
        2 kB
        Simon Helsen

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          castagna Paolo Castagna added a comment -

          > The requested improvement and proposed patch is made by Simon Helsen on behalf of IBM

          First of all, thanks for opening this issue and submitting a patch.

          We (@Talis) also have the need to cancel a query and we were doing it in a slightly (and probably not ideal) way. Your approach seems better to me.

          I'd love to see this (or a similar functionality) added to ARQ & TDB and I suspect everybody in the world with a public SPARQL endpoint is eager to use it.

          > the attached patch is based on ARQ 2.8.5 (and a few classes in TDB 0.8.7 -> possibly the other triple store implementations need adjustement as well)

          I had problems to cleanly apply your patch to ARQ trunk.

          I needed to manually edit these three files:

          • QueryIterBlockTriplesQH.java
          • QueryIterGroup.java
          • QueryExecutionBase.java

          Not a big deal.

          I cannot possibly comment on all the code changes since I've just looked at it and I have not done any serious testing yet.

          However, ARQ and TDB tests all pass, which is good news. Ideally, we would need to add some unit test for the new functionality.

          I attach patch files which should apply cleanly to ARQ trunk (r8431) and TDB trunk (r8431).

          Show
          castagna Paolo Castagna added a comment - > The requested improvement and proposed patch is made by Simon Helsen on behalf of IBM First of all, thanks for opening this issue and submitting a patch. We (@Talis) also have the need to cancel a query and we were doing it in a slightly (and probably not ideal) way. Your approach seems better to me. I'd love to see this (or a similar functionality) added to ARQ & TDB and I suspect everybody in the world with a public SPARQL endpoint is eager to use it. > the attached patch is based on ARQ 2.8.5 (and a few classes in TDB 0.8.7 -> possibly the other triple store implementations need adjustement as well) I had problems to cleanly apply your patch to ARQ trunk. I needed to manually edit these three files: QueryIterBlockTriplesQH.java QueryIterGroup.java QueryExecutionBase.java Not a big deal. I cannot possibly comment on all the code changes since I've just looked at it and I have not done any serious testing yet. However, ARQ and TDB tests all pass, which is good news. Ideally, we would need to add some unit test for the new functionality. I attach patch files which should apply cleanly to ARQ trunk (r8431) and TDB trunk (r8431).
          Hide
          shelsen Simon Helsen added a comment -

          Hi Paolo,

          thanks for picking this up. Yes, the patch was made for an earlier version, so I am not surprised you had to do a few manual things. I agree there need to be specific cancellation tests. I have a few in our own internal framework, but they are not transportable.

          The tests could take the following form:

          1) first create a setup such that a given query takes a minimum amount of seconds (say X). This can be done in a loop, i.e. by creating more resources until the threshold is reached.
          2) then create a thread which runs the query again and is now expected to last about X seconds. The main thread simply waits Y seconds, which is significantly less than X seconds (possibly even 0), and then invokes cancel().
          3) the query thread joins with the test thread, which then inspects the result to see if it produced a partial result. The main thread should not have to wait very long once cancel() is invoked (this, however, cannot be automatically asserted reliably)

          The above scenario does not work equally for all queries of course. For one, you ideally have a query which produces enough results so you can see a partial result (cancel() does not imply a partial result, just a cancel()) and takes a relatively long time on your typical test machine (close to X after the first data creation)

          Useful things to test specifically:
          1) different kind of queries like select, describe, etc.
          2) sort (this behaves rather differently)
          3) distinct

          Simon

          Show
          shelsen Simon Helsen added a comment - Hi Paolo, thanks for picking this up. Yes, the patch was made for an earlier version, so I am not surprised you had to do a few manual things. I agree there need to be specific cancellation tests. I have a few in our own internal framework, but they are not transportable. The tests could take the following form: 1) first create a setup such that a given query takes a minimum amount of seconds (say X). This can be done in a loop, i.e. by creating more resources until the threshold is reached. 2) then create a thread which runs the query again and is now expected to last about X seconds. The main thread simply waits Y seconds, which is significantly less than X seconds (possibly even 0), and then invokes cancel(). 3) the query thread joins with the test thread, which then inspects the result to see if it produced a partial result. The main thread should not have to wait very long once cancel() is invoked (this, however, cannot be automatically asserted reliably) The above scenario does not work equally for all queries of course. For one, you ideally have a query which produces enough results so you can see a partial result (cancel() does not imply a partial result, just a cancel()) and takes a relatively long time on your typical test machine (close to X after the first data creation) Useful things to test specifically: 1) different kind of queries like select, describe, etc. 2) sort (this behaves rather differently) 3) distinct Simon
          Hide
          castagna Paolo Castagna added a comment -

          I wonder if we can do something simpler and have a simple and fast (and deterministic) approach. I had bad experiences with timeouts and unit tests.

          Something along these lines:

              @Test public void test_API2_Cancel()
              {
                  QueryExecution qExec = makeQExec("SELECT * {?s ?p ?o}") ;
                  try {
                      ResultSet rs = qExec.execSelect() ;
                      assertTrue(rs.hasNext()) ;
                      qExec.cancel();
                      assertTrue(rs.hasNext()) ;
                      rs.nextSolution();
                      assertFalse("Results not expected after cancel.", rs.hasNext()) ;
                  } finally { qExec.close() ; }
              }
          

          But, let's see what other thinks.

          Of course, this approach is too simplistic and I don't see how it can be done for DESCRIBE, CONSTRUCT, GROUP BY, SORT BY, DISTINCT, ...

          Show
          castagna Paolo Castagna added a comment - I wonder if we can do something simpler and have a simple and fast (and deterministic) approach. I had bad experiences with timeouts and unit tests. Something along these lines: @Test public void test_API2_Cancel() { QueryExecution qExec = makeQExec( "SELECT * {?s ?p ?o}" ) ; try { ResultSet rs = qExec.execSelect() ; assertTrue(rs.hasNext()) ; qExec.cancel(); assertTrue(rs.hasNext()) ; rs.nextSolution(); assertFalse( "Results not expected after cancel." , rs.hasNext()) ; } finally { qExec.close() ; } } But, let's see what other thinks. Of course, this approach is too simplistic and I don't see how it can be done for DESCRIBE, CONSTRUCT, GROUP BY, SORT BY, DISTINCT, ...
          Hide
          shelsen Simon Helsen added a comment -

          right, my tests where designed to work remotely over HTTP, which is why I didn't come up with your suggestion above. I agree the timeout tests are tricky, but we have them working very reliably in our test suites

          Show
          shelsen Simon Helsen added a comment - right, my tests where designed to work remotely over HTTP, which is why I didn't come up with your suggestion above. I agree the timeout tests are tricky, but we have them working very reliably in our test suites
          Hide
          castagna Paolo Castagna added a comment -

          Ack. So, I'll see if we can have a minimal set of deterministic unit tests alongside the approach showed above and add those to the patch.

          CONSTRUCT and DESCRIBE queries seems impossible to test this way, though. If you have ideas/suggestions, please, let me know.

          Show
          castagna Paolo Castagna added a comment - Ack. So, I'll see if we can have a minimal set of deterministic unit tests alongside the approach showed above and add those to the patch. CONSTRUCT and DESCRIBE queries seems impossible to test this way, though. If you have ideas/suggestions, please, let me know.
          Hide
          shellac Damian Steer added a comment -

          'Deterministic' is tough. For non-deterministic how about:

          ... { ?s ?p ?o . FILTER ex:slow(?s, ?p, ?o) }
          

          Where 'ex:slow' is a function which sleeps for a small time. Run on a thread and cancel?

          Better: mock a graph. Let graph#find drip through, halt, issue cancel. You'll still need threads, but they won't have to run concurrently.

          Show
          shellac Damian Steer added a comment - 'Deterministic' is tough. For non-deterministic how about: ... { ?s ?p ?o . FILTER ex:slow(?s, ?p, ?o) } Where 'ex:slow' is a function which sleeps for a small time. Run on a thread and cancel? Better: mock a graph. Let graph#find drip through, halt, issue cancel. You'll still need threads, but they won't have to run concurrently.
          Hide
          shelsen Simon Helsen added a comment -

          There is a small loophole in the both the abort() and cancel() methods on QueryExecutionBase. If for some reason, the abort/cancel was invoked during the construction of a plan, the query is never aborted or cancelled. The addition here is a simple fix for this problem which I applied onto my original patch. Paolo, should be straighforward to patch it onto HEAD

          Show
          shelsen Simon Helsen added a comment - There is a small loophole in the both the abort() and cancel() methods on QueryExecutionBase. If for some reason, the abort/cancel was invoked during the construction of a plan, the query is never aborted or cancelled. The addition here is a simple fix for this problem which I applied onto my original patch. Paolo, should be straighforward to patch it onto HEAD
          Hide
          castagna Paolo Castagna added a comment -

          Hi Simon, thanks again.

          I added the additional patch to the JENA-29_ARQ_r8489.patch, JENA-29_TDB_r8489.patch is the same.

          I am now going to add more tests.

          Show
          castagna Paolo Castagna added a comment - Hi Simon, thanks again. I added the additional patch to the JENA-29 _ARQ_r8489.patch, JENA-29 _TDB_r8489.patch is the same. I am now going to add more tests.
          Hide
          castagna Paolo Castagna added a comment -

          Damian, Andy, I hope this first attempt at adding better tests is going in the right direction, as you suggested.

          I am not so sure I understand the role of ex:slow() when used with ORDER BY ex:slow().

          Show
          castagna Paolo Castagna added a comment - Damian, Andy, I hope this first attempt at adding better tests is going in the right direction, as you suggested. I am not so sure I understand the role of ex:slow() when used with ORDER BY ex:slow().
          Show
          castagna Paolo Castagna added a comment - Snapshots for ARQ and TDB patched with JENA-29 are available here: http://oss.talisplatform.com/content/repositories/talis-snapshots/com/hp/hpl/jena/arq/2.8.8-JENA_29-SNAPSHOT/ http://oss.talisplatform.com/content/repositories/talis-snapshots/com/hp/hpl/jena/tdb/0.8.10-JENA_29-SNAPSHOT/
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Excellent patch. Looking through it, I have some questions and comments. This is an important feature that touches several areas so I want to make sure I understand the implications across the system.

          Timeouts:

          We need to add a timeout mechanism on top of this patch. Timing out a query is (I think) going to be the #1 use case. One way to do it is to set off a thread when the query starts to call cancel() when the timer goes off. But most queries won't timeout (!!!) so this seems a little heavy. Instead, I suggest that QuerIteratorBase periodically check the clocks if a timeout is set.

          QueryExecution.setTimeout(millis)
          QueryIterator.setTimeout(millis)

          It might as well be possible to set/reset during query execution. Timeouts are "best effort" anyway, a good contract is that an execution will not be timeout out before the timeout setting.

          It's going to be hard to implement timeouts everywhere (e.g. SDB will be influenced by the DB capabilities).

          QueryIterator is an interface and is implemented by QueryIteratorBase. This has the machinery for hasNextBinding, moveToNextBinding. Even though there are non-execution query iterators, this seems to me to be one of the places to add the mechanism.

          The alternative is that QueryIter does the work - it is the root of iterators associated with an execution, with registration and tracking

          We'll need to timeout on sorting and grouping as well, and maybe any materializing iterators.

          What to do on timeout?

          For API use, that's relatively easy. An exception can be thrown on .hasNext() or .next().

          The effect of this on the results output (XML, JSON) needs to be consider though. The catch is that HTTP needs the status code as the first line of the HTTP response. And if it's 2xx, it means success (of the operation).

          "206 Partial Content" is for partial GETs so is not applicable

          This leaves, taking the XML form, I see the following choices:

          1/ Silent truncation - just return less (and wrong) results.
          2/ Return illegal XML - truncate when the exception occurs.
          3/ Buffer - get all the results before sending any, then the HTTP status code can be set.

          (3) is ideal functionally but looses streaming.

          Some questions, discussion points:

          1/ Do we need a separate "cancel" from "abort", given that abort currently does a close and has to be called on the execution thread. We could extend the contract of abort to cover non-execution threads, making it cancel. The cancel mechanism of the patch would be the mechanism.

          2/ I didn't understand the difference between cancel, requestCancel and requestSubCancel. As far as I can see, just a way to pass cancel down, with the contract it's potentially asynchronous. This is what the patch for QueryIter2 does, QueryIter1 has requestCancel/requestSubCancel and QueryIterBase has requestCancel. Can these all be "cancel". Interface QueryIterator already has abort.

          3/ QueryIterSort , QueryIterGroup

          In trunk, these do all the work in constructing function, then return an iterator over the results. In the patch they return return an iterator that delays the work until hasNext is called. But hasNext is going to be called immediately anyway because the results are pulled by the application. I didn't undertand the need for the iterator - what am I missing?.

          For sort, the sorted is pulling on an underlying query iterator so the cancel mechanism will there. Maybe we need the comparator to also test the cancel flags - the javadoc for Arrays.sort does not say much here.

          Note the extenal sort patch in JENA-44.

          For group, the grouping is pulling on an underlying query iterator and the group operation is not a significant cost.

          Show
          andy.seaborne Andy Seaborne added a comment - Excellent patch. Looking through it, I have some questions and comments. This is an important feature that touches several areas so I want to make sure I understand the implications across the system. Timeouts: We need to add a timeout mechanism on top of this patch. Timing out a query is (I think) going to be the #1 use case. One way to do it is to set off a thread when the query starts to call cancel() when the timer goes off. But most queries won't timeout (!!!) so this seems a little heavy. Instead, I suggest that QuerIteratorBase periodically check the clocks if a timeout is set. QueryExecution.setTimeout(millis) QueryIterator.setTimeout(millis) It might as well be possible to set/reset during query execution. Timeouts are "best effort" anyway, a good contract is that an execution will not be timeout out before the timeout setting. It's going to be hard to implement timeouts everywhere (e.g. SDB will be influenced by the DB capabilities). QueryIterator is an interface and is implemented by QueryIteratorBase. This has the machinery for hasNextBinding, moveToNextBinding. Even though there are non-execution query iterators, this seems to me to be one of the places to add the mechanism. The alternative is that QueryIter does the work - it is the root of iterators associated with an execution, with registration and tracking We'll need to timeout on sorting and grouping as well, and maybe any materializing iterators. What to do on timeout? For API use, that's relatively easy. An exception can be thrown on .hasNext() or .next(). The effect of this on the results output (XML, JSON) needs to be consider though. The catch is that HTTP needs the status code as the first line of the HTTP response. And if it's 2xx, it means success (of the operation). "206 Partial Content" is for partial GETs so is not applicable This leaves, taking the XML form, I see the following choices: 1/ Silent truncation - just return less (and wrong) results. 2/ Return illegal XML - truncate when the exception occurs. 3/ Buffer - get all the results before sending any, then the HTTP status code can be set. (3) is ideal functionally but looses streaming. Some questions, discussion points: 1/ Do we need a separate "cancel" from "abort", given that abort currently does a close and has to be called on the execution thread. We could extend the contract of abort to cover non-execution threads, making it cancel. The cancel mechanism of the patch would be the mechanism. 2/ I didn't understand the difference between cancel, requestCancel and requestSubCancel. As far as I can see, just a way to pass cancel down, with the contract it's potentially asynchronous. This is what the patch for QueryIter2 does, QueryIter1 has requestCancel/requestSubCancel and QueryIterBase has requestCancel. Can these all be "cancel". Interface QueryIterator already has abort. 3/ QueryIterSort , QueryIterGroup In trunk, these do all the work in constructing function, then return an iterator over the results. In the patch they return return an iterator that delays the work until hasNext is called. But hasNext is going to be called immediately anyway because the results are pulled by the application. I didn't undertand the need for the iterator - what am I missing?. For sort, the sorted is pulling on an underlying query iterator so the cancel mechanism will there. Maybe we need the comparator to also test the cancel flags - the javadoc for Arrays.sort does not say much here. Note the extenal sort patch in JENA-44 . For group, the grouping is pulling on an underlying query iterator and the group operation is not a significant cost.
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Any experiences of running the ARQ-2.8.8-JENA_29-SNAPSHOT build to report yet?

          Show
          andy.seaborne Andy Seaborne added a comment - Any experiences of running the ARQ-2.8.8-JENA_29-SNAPSHOT build to report yet?
          Hide
          shelsen Simon Helsen added a comment -

          Andy, I think I already explained some of your questions in the description. Let me address them here again:

          1) timeouts: This is the number one use-case and we (in our own framework) achieve this by having one thread monitor all incoming query executions. We use it for other things like advanced scheduling. Personally, I think it is better to leave this to the clients instead of trying to jam it into QueryIteratorBase. The key is that a client is able to invoke cancel() from a secondary thread themselves and get partial results without exceptions.

          2) As for the API, we abuse 206 because we feel 200 is wrong since it deceives the client in believing the results may be complete, yet, they are not - or rather - once a cancel() has been invoked, it cannot be know if the results will be complete. Note that my patch is specifically designed so that the query completes normally, except that it will only show potentially partial results

          3) this leads me to the mechanism about requestCancel versus cancel. This is critical actually! The requestCancel() can be invoked at any moment by a parallel thread. This is not the case for cancel , nor abort. If you randomly invoke cancel or abort, you will get exceptions of all sorts and kinds, NPEs, concurrent mod exceptions, things like that. The reason is simple: if you immediately cancel() (and thus make hasNext() return false), you break the invariant that hasNext()==true always allows next() to work fine:

          t1 t2
          hasNext==true ....
          ... cancel()
          next() => breaks ....

          t1's next() should not succeed anymore when cancel() has been invoked. So, to alleviate this, cancel() really only requests a cancellation instead of enforcing it so that all iterators can allign before cancelling the execution. I have tests which show that this breaks without requestCancel()

          4) QueryIterSort and QueryIterGroup: again, quite critical to make partial results work. They way the patch is constructed, it is possible for a sorting query to cancel somewhere during execution, yet, still return the partial results sorted. I had to refactor the execution to make sure this keeps working. I have tests which proof this. Before my changes, a sorting query would only ever return 1 result when cancelled, now, it returns as many results as the iterator was able to collect, but before returning the partial results, it sorts them. Similar for groups

          5) abort(): I would request to keep abort. Abort, unlike cancel(), is not threadsafe, i.e. it will cause exceptions. However, it is more aggressive and therefore the right thing when you just want to quickly abort an execution without caring about partial results. We have important use-cases for that. In those scenarios, we catch and swallow any exceptions around the abort() call as well as the execute call (in the query thread) itself.

          6) buffering on a web server. It is correct that we had to buffer results in order to be able to change the status code, but it seems that in practice, this is behaving well for us

          Andy, the patch as I provided it makes sure the iterator semantics is in tact and truely partial (but correct) results can be computed. As you wrote above as well, it is best effort, meaning that cancel() is not immediate, but tries to gracefully bring the computation to an end so that a client can still process the found results. That is why the cancel() as I provided it in the patch, does not throw any exception. It gracefully finishes computation and this is critical for us!

          I hope you understand what I tried to do and why the patch is doing exactly what we need. If you decide to deviate, please explain carefully as we may be forced to patch any adoptions in those cases

          thanks,

          Simon

          Show
          shelsen Simon Helsen added a comment - Andy, I think I already explained some of your questions in the description. Let me address them here again: 1) timeouts: This is the number one use-case and we (in our own framework) achieve this by having one thread monitor all incoming query executions. We use it for other things like advanced scheduling. Personally, I think it is better to leave this to the clients instead of trying to jam it into QueryIteratorBase. The key is that a client is able to invoke cancel() from a secondary thread themselves and get partial results without exceptions. 2) As for the API, we abuse 206 because we feel 200 is wrong since it deceives the client in believing the results may be complete, yet, they are not - or rather - once a cancel() has been invoked, it cannot be know if the results will be complete. Note that my patch is specifically designed so that the query completes normally, except that it will only show potentially partial results 3) this leads me to the mechanism about requestCancel versus cancel. This is critical actually! The requestCancel() can be invoked at any moment by a parallel thread. This is not the case for cancel , nor abort. If you randomly invoke cancel or abort, you will get exceptions of all sorts and kinds, NPEs, concurrent mod exceptions, things like that. The reason is simple: if you immediately cancel() (and thus make hasNext() return false), you break the invariant that hasNext()==true always allows next() to work fine: t1 t2 hasNext==true .... ... cancel() next() => breaks .... t1's next() should not succeed anymore when cancel() has been invoked. So, to alleviate this, cancel() really only requests a cancellation instead of enforcing it so that all iterators can allign before cancelling the execution. I have tests which show that this breaks without requestCancel() 4) QueryIterSort and QueryIterGroup: again, quite critical to make partial results work. They way the patch is constructed, it is possible for a sorting query to cancel somewhere during execution, yet, still return the partial results sorted. I had to refactor the execution to make sure this keeps working. I have tests which proof this. Before my changes, a sorting query would only ever return 1 result when cancelled, now, it returns as many results as the iterator was able to collect, but before returning the partial results, it sorts them. Similar for groups 5) abort(): I would request to keep abort. Abort, unlike cancel(), is not threadsafe, i.e. it will cause exceptions. However, it is more aggressive and therefore the right thing when you just want to quickly abort an execution without caring about partial results. We have important use-cases for that. In those scenarios, we catch and swallow any exceptions around the abort() call as well as the execute call (in the query thread) itself. 6) buffering on a web server. It is correct that we had to buffer results in order to be able to change the status code, but it seems that in practice, this is behaving well for us Andy, the patch as I provided it makes sure the iterator semantics is in tact and truely partial (but correct) results can be computed. As you wrote above as well, it is best effort, meaning that cancel() is not immediate, but tries to gracefully bring the computation to an end so that a client can still process the found results. That is why the cancel() as I provided it in the patch, does not throw any exception. It gracefully finishes computation and this is critical for us! I hope you understand what I tried to do and why the patch is doing exactly what we need. If you decide to deviate, please explain carefully as we may be forced to patch any adoptions in those cases thanks, Simon
          Hide
          shelsen Simon Helsen added a comment -

          Not sure what you mean about ARQ-2.8.8-JENA_29-SNAPSHOT, we are currently on 2.8.7 and are not planning to upgrade. What exactly is in it and what should I test? Is this about the direct mode corruption?

          Show
          shelsen Simon Helsen added a comment - Not sure what you mean about ARQ-2.8.8-JENA_29-SNAPSHOT, we are currently on 2.8.7 and are not planning to upgrade. What exactly is in it and what should I test? Is this about the direct mode corruption?
          Hide
          castagna Paolo Castagna added a comment - - edited

          > not sure what you mean about ARQ-2.8.8-JENA_29-SNAPSHOT

          It's simply a SNAPSHOT with JENA-29 patch applied, to make it easier for people to try and test it:
          https://issues.apache.org/jira/browse/JENA-29?focusedCommentId=12993115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12993115
          Having more people looking at a patch and/or using or trying it and reporting problems, even before we commit it into trunk, is very helpful.

          Show
          castagna Paolo Castagna added a comment - - edited > not sure what you mean about ARQ-2.8.8-JENA_29-SNAPSHOT It's simply a SNAPSHOT with JENA-29 patch applied, to make it easier for people to try and test it: https://issues.apache.org/jira/browse/JENA-29?focusedCommentId=12993115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12993115 Having more people looking at a patch and/or using or trying it and reporting problems, even before we commit it into trunk, is very helpful.
          Hide
          shelsen Simon Helsen added a comment -

          Thanks for the clarification Paolo. I now see that the build has jena_29 in it, i.e. the number of this defect

          Show
          shelsen Simon Helsen added a comment - Thanks for the clarification Paolo. I now see that the build has jena_29 in it, i.e. the number of this defect
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Simon - I still don't understand why the patch to QueryIter1 is different in style to QueryIter2.

          QueryIter1 propagates requestCancel() down the chain.
          QueryIter2 turns requestCancel() into calls of cancel() down the tree. Why not requestCancel?

          Currently, in implementation terms, the only difference between close() and abort() is that abort suppresses exceptions.
          The implementation of abort() current calls closeIterator() so we have the opportunity to change the contract and make it threadsafe, extending its contract. There is overlap between abort(), cancel() and close(). While we're in the area, it seems better to make a consistent set of operations.

          > I have tests which show
          > I have tests which proof this.
          Can you share and upload these tests?

          Paolo has added tests but more is good.

          Show
          andy.seaborne Andy Seaborne added a comment - Simon - I still don't understand why the patch to QueryIter1 is different in style to QueryIter2. QueryIter1 propagates requestCancel() down the chain. QueryIter2 turns requestCancel() into calls of cancel() down the tree. Why not requestCancel? Currently, in implementation terms, the only difference between close() and abort() is that abort suppresses exceptions. The implementation of abort() current calls closeIterator() so we have the opportunity to change the contract and make it threadsafe, extending its contract. There is overlap between abort(), cancel() and close(). While we're in the area, it seems better to make a consistent set of operations. > I have tests which show > I have tests which proof this. Can you share and upload these tests? Paolo has added tests but more is good.
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Checked in to SF.
          New JIRAs for next stage.

          Question about QueryIterAbortCancellationRequestException:

          Why is this thrown? It cuts through the stack bypassing QueryIteratorBase and goes straight back to QueryExecutionBase, where it's caught and treated like .cancel has returned.

          Can't QueryIterSort/Group call super.cancel, or implement requestCancel()? Even being careful of the initialization, can't requestCancel be used. Bypassing might mean cancel requests on intermediate iterators (subqueries) are missed. Unlikely a problem now but if something has to free external resources (scalable sort / JENA-44 for example) which might be allocated at constructor time.

          Show
          andy.seaborne Andy Seaborne added a comment - Checked in to SF. New JIRAs for next stage. Question about QueryIterAbortCancellationRequestException: Why is this thrown? It cuts through the stack bypassing QueryIteratorBase and goes straight back to QueryExecutionBase, where it's caught and treated like .cancel has returned. Can't QueryIterSort/Group call super.cancel, or implement requestCancel()? Even being careful of the initialization, can't requestCancel be used. Bypassing might mean cancel requests on intermediate iterators (subqueries) are missed. Unlikely a problem now but if something has to free external resources (scalable sort / JENA-44 for example) which might be allocated at constructor time.
          Hide
          shelsen Simon Helsen added a comment -

          Andy, a few things

          1) the difference between QueryIter1 and QueryIter2 is just following the pattern for closing.
          2) ok, if abort is identical to close except for trying to suppress exceptions, it can probably be removed since we have to catch exceptions surrounding the abort as well as the actual execution
          3) I cannot share my tests since they are tied to our own framework which wraps around jena. They rely on our own monitoring thread and I don't know how I could move that without essentially writing new tests. The key is that in these tests, I can manually verify that partial results come back and that the timeouts are more or less observed. So, I have a query which runs, say 10s, but times out after 1500ms and produces, say 5, instead of 100 results.
          4) QueryIterAbortCancellationRequestException: this exception is thrown whenever there is an embedded iterator which was cancelled (notably the sort). If I do not "abort" the "cancel", I would still only see at most 1 result instead of all results which the embedded iterator found. If you have a better idea on how to handle this, be my guest, but I was not able to get more than one result when a sorting query was cancelled in the middle

          Show
          shelsen Simon Helsen added a comment - Andy, a few things 1) the difference between QueryIter1 and QueryIter2 is just following the pattern for closing. 2) ok, if abort is identical to close except for trying to suppress exceptions, it can probably be removed since we have to catch exceptions surrounding the abort as well as the actual execution 3) I cannot share my tests since they are tied to our own framework which wraps around jena. They rely on our own monitoring thread and I don't know how I could move that without essentially writing new tests. The key is that in these tests, I can manually verify that partial results come back and that the timeouts are more or less observed. So, I have a query which runs, say 10s, but times out after 1500ms and produces, say 5, instead of 100 results. 4) QueryIterAbortCancellationRequestException: this exception is thrown whenever there is an embedded iterator which was cancelled (notably the sort). If I do not "abort" the "cancel", I would still only see at most 1 result instead of all results which the embedded iterator found. If you have a better idea on how to handle this, be my guest, but I was not able to get more than one result when a sorting query was cancelled in the middle
          Hide
          beobal Sam Tunnicliffe added a comment -

          Andy,

          Am I understanding your last comment correctly? ("Checked in to SF"). I just checked out a fresh copy of trunk, to make an updated patch for JENA-44, but I'm not seeing these modifications there.

          Show
          beobal Sam Tunnicliffe added a comment - Andy, Am I understanding your last comment correctly? ("Checked in to SF"). I just checked out a fresh copy of trunk, to make an updated patch for JENA-44 , but I'm not seeing these modifications there.
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Have just committed the final changes for this stage after some sorting out and testing.

          Show
          andy.seaborne Andy Seaborne added a comment - Have just committed the final changes for this stage after some sorting out and testing.
          Hide
          rbattle Robert Battle added a comment -

          We (@BBN) also have a need for query cancellation. We have previously implemented it in a non-ideal way (we needed to ensure that all of our external resources were closed) so we are very interested in this patch.

          I'm a little unclear on a few points though:
          1) Is it necessary to supply partially sorted/grouped results for a cancelled query? If the query is cancelled, why should the client care what the partial results were?
          2) Is it the case that this patch implies that abort should still be used if you don't care about the results of a query but want to stop it's execution?

          Show
          rbattle Robert Battle added a comment - We (@BBN) also have a need for query cancellation. We have previously implemented it in a non-ideal way (we needed to ensure that all of our external resources were closed) so we are very interested in this patch. I'm a little unclear on a few points though: 1) Is it necessary to supply partially sorted/grouped results for a cancelled query? If the query is cancelled, why should the client care what the partial results were? 2) Is it the case that this patch implies that abort should still be used if you don't care about the results of a query but want to stop it's execution?
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Robert,

          Q1: No, not really. One thing to be careful of though if the sort is partial, then the first element might not be the largest/smallest. i.e. it is a non-answer, not a truncated answer. If a query is aborted by an outside thread, then we need to define the contract on any answers remaining.

          An alternative is if cancelled, then there is an exception, not silent early truncation with undefined answers. Either approach has application implications.

          Q2: Calling close() during query results is preferred. Currently, .abort must be called on the same thread as the execution. We are considering extending abort to be callable from another thread and having abort/close not abort/cancel/close.

          Show
          andy.seaborne Andy Seaborne added a comment - Robert, Q1: No, not really. One thing to be careful of though if the sort is partial, then the first element might not be the largest/smallest. i.e. it is a non-answer, not a truncated answer. If a query is aborted by an outside thread, then we need to define the contract on any answers remaining. An alternative is if cancelled, then there is an exception, not silent early truncation with undefined answers. Either approach has application implications. Q2: Calling close() during query results is preferred. Currently, .abort must be called on the same thread as the execution. We are considering extending abort to be callable from another thread and having abort/close not abort/cancel/close.
          Hide
          shelsen Simon Helsen added a comment -

          Q1: We have concrete use-cases to get partial results, so I really have to stress we need that support very strongly. As for what exactly a truncated answer is, of course, you need to define what it means. But in the case of sorting, it is accepted by our clients that the returned result set S is a subset of TS (where TS is the complete result set if it was allowed to finish) and for all s1,s2 in either S or TS, it holds that s1<=s2 according to the given ordering criteria. My patch does exactly that and does a good job at making S as large as possible until the cancel() is invoked. Without the changes in the sorting iterator, I would only ever see 1 result, which is useless for us

          Q2: yes and no. I need to keep repeating that the current version of abort() is not thread-safe. If you call it from a secondary thread, it will cause exceptions in the main execution thread. If you can live with that, then, yes, you could use abort or even close since either way, you need to catch all exceptions in both the thread which performs the cancel as well as the thread which executes. The way the cancel() patch has been implemented, this is not needed. It is safe being called from an outside thread to then safely finish the cancelled query and return as many results as was possible up to that point

          While I cannot speak for the requirements of other stakeholders, we need the partial results, which is why I made all the changes you see in the patches. Moreover, if you want partial results, you want a graceful completion of the execution, making either abort or close unusable

          Show
          shelsen Simon Helsen added a comment - Q1: We have concrete use-cases to get partial results, so I really have to stress we need that support very strongly. As for what exactly a truncated answer is, of course, you need to define what it means. But in the case of sorting, it is accepted by our clients that the returned result set S is a subset of TS (where TS is the complete result set if it was allowed to finish) and for all s1,s2 in either S or TS, it holds that s1<=s2 according to the given ordering criteria. My patch does exactly that and does a good job at making S as large as possible until the cancel() is invoked. Without the changes in the sorting iterator, I would only ever see 1 result, which is useless for us Q2: yes and no. I need to keep repeating that the current version of abort() is not thread-safe. If you call it from a secondary thread, it will cause exceptions in the main execution thread. If you can live with that, then, yes, you could use abort or even close since either way, you need to catch all exceptions in both the thread which performs the cancel as well as the thread which executes. The way the cancel() patch has been implemented, this is not needed. It is safe being called from an outside thread to then safely finish the cancelled query and return as many results as was possible up to that point While I cannot speak for the requirements of other stakeholders, we need the partial results, which is why I made all the changes you see in the patches. Moreover, if you want partial results, you want a graceful completion of the execution, making either abort or close unusable
          Hide
          shelsen Simon Helsen added a comment -

          I just looked at our code and in the scenarios where we do not need partial results (we have those use-cases as well), we use close and just catch anything that happens. One thing we needed was for a given query execution to know what happened, so we have a wrapper which keeps track if cancel() or close() was invoked. Ideally, this could move into QueryExecution because if a secondary thread calls either cancel() or close(), you want to be able to investigate what happened to the query when it returns

          Show
          shelsen Simon Helsen added a comment - I just looked at our code and in the scenarios where we do not need partial results (we have those use-cases as well), we use close and just catch anything that happens. One thing we needed was for a given query execution to know what happened, so we have a wrapper which keeps track if cancel() or close() was invoked. Ideally, this could move into QueryExecution because if a secondary thread calls either cancel() or close(), you want to be able to investigate what happened to the query when it returns
          Hide
          shelsen Simon Helsen added a comment -

          I am looking again at the code, so, Andy, to clarify the need for requesting versus cancelling is primarily that when I request a cancel, the iterator will only move to a cancel state once the nextBinding has executed, the reason being that at that point, hasNext() is allowed to answer true or false, i.e. when cancelled, it will return false. However, if you immediately make a cancel out of request for cancel, you may have a situation where hasNext() will return false, but nextBinding is going to be called because the async thread requested the cancel after the executing thread already passed hasNext(). Hope that this is clear now

          Now, I agree that the pattern between cancel() and requestCancel() is a bit confusing. This has to do with the fact that cancel() actually sets the "requestCancel" flag, whereas requestCancel() just propagates the cancellationRequest over the iterator network. That is also why the pattern between QueryIter1 and QueryIter2 looks different. Perhaps this could all be made more readable by having each iterator just implement requestCancel() and in that method also set the requestCancellation flag. In that case, there is no need to call cancel() again and it is clear how the cancellation requests are propagated.

          Finally, and this is a different topic. We were noticing weak cancellation behavior whenever there are optional clauses and I now notice that there may be glitch for optional clauses and in fact, any QueryIterRepeatApply. The problem is that each time one moves to the next stage, the currentStage becomes null for a short time and thus neither close() nor cancel() will catch in that short time. Afterwards, close or cancel will be forgotten. I'll provide a patch for this

          Show
          shelsen Simon Helsen added a comment - I am looking again at the code, so, Andy, to clarify the need for requesting versus cancelling is primarily that when I request a cancel, the iterator will only move to a cancel state once the nextBinding has executed, the reason being that at that point, hasNext() is allowed to answer true or false, i.e. when cancelled, it will return false. However, if you immediately make a cancel out of request for cancel, you may have a situation where hasNext() will return false, but nextBinding is going to be called because the async thread requested the cancel after the executing thread already passed hasNext(). Hope that this is clear now Now, I agree that the pattern between cancel() and requestCancel() is a bit confusing. This has to do with the fact that cancel() actually sets the "requestCancel" flag, whereas requestCancel() just propagates the cancellationRequest over the iterator network. That is also why the pattern between QueryIter1 and QueryIter2 looks different. Perhaps this could all be made more readable by having each iterator just implement requestCancel() and in that method also set the requestCancellation flag. In that case, there is no need to call cancel() again and it is clear how the cancellation requests are propagated. Finally, and this is a different topic. We were noticing weak cancellation behavior whenever there are optional clauses and I now notice that there may be glitch for optional clauses and in fact, any QueryIterRepeatApply. The problem is that each time one moves to the next stage, the currentStage becomes null for a short time and thus neither close() nor cancel() will catch in that short time. Afterwards, close or cancel will be forgotten. I'll provide a patch for this
          Hide
          sallen Stephen Allen added a comment -

          I argue that there should be no expectation of returning any results that might be "queued up" in an iterator after cancellation has been requested. It seems arbitrary to decide that sorting/grouping operations should continue after a cancel is requested. These operations may be expensive (especially if JENA-44 is implemented). Instead I think no further results should be returned after cancellation. Also queries with sort/group subqueries may yet have a long way to go after the sort/group finishes (I believe this is what Andy was getting at in JENA-48).

          In my mind, if there is a requirement have some sorted partial results after a cancellation is requested, then the sorting/grouping should be performed on the client-side with streaming results from the server/ARQ.

          Show
          sallen Stephen Allen added a comment - I argue that there should be no expectation of returning any results that might be "queued up" in an iterator after cancellation has been requested. It seems arbitrary to decide that sorting/grouping operations should continue after a cancel is requested. These operations may be expensive (especially if JENA-44 is implemented). Instead I think no further results should be returned after cancellation. Also queries with sort/group subqueries may yet have a long way to go after the sort/group finishes (I believe this is what Andy was getting at in JENA-48 ). In my mind, if there is a requirement have some sorted partial results after a cancellation is requested, then the sorting/grouping should be performed on the client-side with streaming results from the server/ARQ.
          Hide
          shelsen Simon Helsen added a comment -

          Stephen, your comments are troublesome because obviously a client cannot do this unless we have to abandon the use of ORDER BY entirely (which we don't). I think it is reasonable to sort whatever is queued up before, even if this is "somewhat" expensive, i.e. as described in JENA-44. That is the whole idea behind cancel(). If you keep insisting that cancel() may NOT return anything useful, this is as much as saying you don't want cancel() support at all. In that case, simply stick to close(). The whole point of my cancel() addition is to still get useful results even if a query execution had to stop prematurely.

          Also, sorting should be a fraction of the time compared to real query execution (and currently, that is certainly the case, but I don't see why why JENA-44 changes that).

          Finally, as for JENA-48, I presume this is only an issue for sub queries (is that standard SPARQL?) in which case cancellation could be more conservative, i.e. the cancel of a subquery should probably be ignored, but that is beyond any of the use-cases I know and can probably resolved without abandoning the expectation I described above

          Show
          shelsen Simon Helsen added a comment - Stephen, your comments are troublesome because obviously a client cannot do this unless we have to abandon the use of ORDER BY entirely (which we don't). I think it is reasonable to sort whatever is queued up before, even if this is "somewhat" expensive, i.e. as described in JENA-44 . That is the whole idea behind cancel(). If you keep insisting that cancel() may NOT return anything useful, this is as much as saying you don't want cancel() support at all. In that case, simply stick to close(). The whole point of my cancel() addition is to still get useful results even if a query execution had to stop prematurely. Also, sorting should be a fraction of the time compared to real query execution (and currently, that is certainly the case, but I don't see why why JENA-44 changes that). Finally, as for JENA-48 , I presume this is only an issue for sub queries (is that standard SPARQL?) in which case cancellation could be more conservative, i.e. the cancel of a subquery should probably be ignored, but that is beyond any of the use-cases I know and can probably resolved without abandoning the expectation I described above
          Hide
          shelsen Simon Helsen added a comment -

          This should avoid that that accidentally, cancellation (and close) "miss" the next stage.

          Show
          shelsen Simon Helsen added a comment - This should avoid that that accidentally, cancellation (and close) "miss" the next stage.
          Hide
          sallen Stephen Allen added a comment -

          Why is it not reasonable to abandon ORDER BY in the query and do it client side? This sounds like a specific use case requirement rather than an ARQ requirement.

          The semantics of SPARQL specify that a solution to a query is valid if it meets all the conditions specified in the BGPs, filters, and solution sequence modifiers (Order, Project, Distinct, Reduced, Offset and Limit). This means that only full-and-final results should be returned once those are fully-and-finally computed. Now in practice, ARQ will build up solutions incrementally in order to stream results back to a client if it is able to. The contract of ORDER BY is that first item to be returned actually is the first item in the solution set/bag. I would say that to have it return anything other is an incorrect result not a partial result. If that is the case, then it makes sense to me to simply abandon any work being done and try to cancel the query as soon as possible. If a client wants to rely on any results that have been returned to it already, that is its prerogative.

          Having said all of that, I'm not saying cancel() support is not important, I think just the opposite. I see your patch as valuable because it adds the ability to cancel a query from a different thread by chaining the cancel request through the iterators.

          P.S. Slight bug fix: in QueryIteratorBase and QueryIterRepeatApply, you need to make the "requestingCancel" field volatile to ensure visibility (see [1] for more info).

          [1] http://www.ibm.com/developerworks/java/library/j-jtp06197.html

          Show
          sallen Stephen Allen added a comment - Why is it not reasonable to abandon ORDER BY in the query and do it client side? This sounds like a specific use case requirement rather than an ARQ requirement. The semantics of SPARQL specify that a solution to a query is valid if it meets all the conditions specified in the BGPs, filters, and solution sequence modifiers (Order, Project, Distinct, Reduced, Offset and Limit). This means that only full-and-final results should be returned once those are fully-and-finally computed. Now in practice, ARQ will build up solutions incrementally in order to stream results back to a client if it is able to. The contract of ORDER BY is that first item to be returned actually is the first item in the solution set/bag. I would say that to have it return anything other is an incorrect result not a partial result. If that is the case, then it makes sense to me to simply abandon any work being done and try to cancel the query as soon as possible. If a client wants to rely on any results that have been returned to it already, that is its prerogative. Having said all of that, I'm not saying cancel() support is not important, I think just the opposite. I see your patch as valuable because it adds the ability to cancel a query from a different thread by chaining the cancel request through the iterators. P.S. Slight bug fix: in QueryIteratorBase and QueryIterRepeatApply, you need to make the "requestingCancel" field volatile to ensure visibility (see [1] for more info). [1] http://www.ibm.com/developerworks/java/library/j-jtp06197.html
          Hide
          shelsen Simon Helsen added a comment -

          Stephen, this is an academic argument and I find it troublesome that you dismiss our use-case. And that is what you do because if you follow your argumentation through, you cannot support any notion of partial result sets at all. And you surely agree that we cannot add support for partial result sets after cancellation without getting into ARQ. That sorting may incidentally be handled on the outside is one case and I question that because if cancellation would only return 1 result, sorting after the fact is useless anyhow. Also, if you insist that we sort ourselves on the outside, you are saying that we should abandon the use of ORDER BY in the query syntax, which is unacceptable given that our use of Jena is motivated by the fact that it implements SPARQL

          And it is not because SPARQL does not define what happens when a query is cancelled that an implementation cannot be pragmatic and do something useful. Please, tell me, what is an ARQ requirement if not a requirement of its clients?

          Show
          shelsen Simon Helsen added a comment - Stephen, this is an academic argument and I find it troublesome that you dismiss our use-case. And that is what you do because if you follow your argumentation through, you cannot support any notion of partial result sets at all. And you surely agree that we cannot add support for partial result sets after cancellation without getting into ARQ. That sorting may incidentally be handled on the outside is one case and I question that because if cancellation would only return 1 result, sorting after the fact is useless anyhow. Also, if you insist that we sort ourselves on the outside, you are saying that we should abandon the use of ORDER BY in the query syntax, which is unacceptable given that our use of Jena is motivated by the fact that it implements SPARQL And it is not because SPARQL does not define what happens when a query is cancelled that an implementation cannot be pragmatic and do something useful. Please, tell me, what is an ARQ requirement if not a requirement of its clients?
          Hide
          andy.seaborne Andy Seaborne added a comment -

          queryIterRepeatApply.patch does not apply. The file name com.hp.hpl.jena.rdf/temp/com/hp/hpl/jena/sparql/engine/iterator/QueryIterRepeatApply.java is confusing Eclipse. Is there a version of the patch against development trunk from SF?

          Show
          andy.seaborne Andy Seaborne added a comment - queryIterRepeatApply.patch does not apply. The file name com.hp.hpl.jena.rdf/temp/com/hp/hpl/jena/sparql/engine/iterator/QueryIterRepeatApply.java is confusing Eclipse. Is there a version of the patch against development trunk from SF?
          Hide
          andy.seaborne Andy Seaborne added a comment - - edited

          Just noticed the "volatile" PS'ed on a comment. Added, mainly for safety.

          I don't think this makes a significant difference because the code isn't doing any kind of synchronization inside code purely within the class (the (while(!flag) example) where the value can be copied to a local (to the thread) temporary copy by the JIT/JVM. But it's safer to add it.

          Show
          andy.seaborne Andy Seaborne added a comment - - edited Just noticed the "volatile" PS'ed on a comment. Added, mainly for safety. I don't think this makes a significant difference because the code isn't doing any kind of synchronization inside code purely within the class (the (while(!flag) example) where the value can be copied to a local (to the thread) temporary copy by the JIT/JVM. But it's safer to add it.
          Hide
          andy.seaborne Andy Seaborne added a comment -

          We have two use cases for what happens when a query is cancelled during a long sort operation. I'd characterise them as "do your best" and "truncate results". In "do your best", the results are based on how far the query engine got before a cancel was acted upon, so any work done until them is exposed. In "truncate results", the cancelation is an asynchrous stopping of output at that point.

          Example:

          SELECT ?x

          { ... } ORDER BY DESC(?x)

          Suppose the values for ?x are 2,3,1 from the graph pattern and a cancellation happens after 2 has been received by the sort code, but before 3 and 1. "do your best" returns 2 because that is in the sort engine; "truncate results" returns nothing because it does not know 2 comes first.

          The same can happen in grouping, and because the group may be the whole query results (aggregate used, no GROUP BY in the query) and so be a long operation.

          Example:

          SELECT ?x { ... }

          ORDER BY DESC(?x) LIMIT 1

          and in SPARQL 1.1:

          SELECT (MAX(?x1) AS ?x)

          { ... }

          The un-cancelled results are one row with ?x=3. The "do your best" gives 2 and "truncate results" gives empty.

          Both seem reasonable application experiences if communicated clearly.

          Show
          andy.seaborne Andy Seaborne added a comment - We have two use cases for what happens when a query is cancelled during a long sort operation. I'd characterise them as "do your best" and "truncate results". In "do your best", the results are based on how far the query engine got before a cancel was acted upon, so any work done until them is exposed. In "truncate results", the cancelation is an asynchrous stopping of output at that point. Example: SELECT ?x { ... } ORDER BY DESC(?x) Suppose the values for ?x are 2,3,1 from the graph pattern and a cancellation happens after 2 has been received by the sort code, but before 3 and 1. "do your best" returns 2 because that is in the sort engine; "truncate results" returns nothing because it does not know 2 comes first. The same can happen in grouping, and because the group may be the whole query results (aggregate used, no GROUP BY in the query) and so be a long operation. Example: SELECT ?x { ... } ORDER BY DESC(?x) LIMIT 1 and in SPARQL 1.1: SELECT (MAX(?x1) AS ?x) { ... } The un-cancelled results are one row with ?x=3. The "do your best" gives 2 and "truncate results" gives empty. Both seem reasonable application experiences if communicated clearly.
          Hide
          shelsen Simon Helsen added a comment -

          Yes, I think I never thought of cancellation as providing truncated answers because if that is the understanding, the answers would be wrong. But I don't think a client ever expects this. If you "cancel" an execution, you expect the best outcome up to that point (what else would be reasonable?).

          Now, I think in the MAX example above, you don't really care about the answer unless it is final. But that is NOT true for sorted results: a client will certainly be interested in the partially sorted results, knowing they are incomplete. I really do not understand why the only correct results for a sorting query are no results

          As for the patch, Andy, can you just manually apply it? It is very short...

          Show
          shelsen Simon Helsen added a comment - Yes, I think I never thought of cancellation as providing truncated answers because if that is the understanding, the answers would be wrong. But I don't think a client ever expects this. If you "cancel" an execution, you expect the best outcome up to that point (what else would be reasonable?). Now, I think in the MAX example above, you don't really care about the answer unless it is final. But that is NOT true for sorted results: a client will certainly be interested in the partially sorted results, knowing they are incomplete. I really do not understand why the only correct results for a sorting query are no results As for the patch, Andy, can you just manually apply it? It is very short...
          Hide
          sallen Stephen Allen added a comment -

          Here are the concerns I have with the "do your best" approach:

          1) Subqueries: If cancellation is aborted when sorts are encountered, in some cases you can't effectively cancel queries with subqueries (see [1] for a pathological case where if canceled towards the end of the subquery execution would still take a long time to complete)

          2) If query cancellation is performed out of the bounds of the client program (say a server-side timer, or through the Joseki web interface), then it would be impossible to know if any particular query returned the correct and full results without performing some (non-SPARQL) call back to the server to check. If, instead, the query results XML were truncated then you could handle it the same as if the connection were broken because at least the closing </sparql> tag would be missing.

          3) I feel like the predominate use case is to terminate run-away queries with the expectation that you don't need partial results. That is to say the regular expectation of your system is to retrieve full and final results. I assert this because if you wanted to build a system that, in the course of it's regular execution, used results that were based on a subset of the population, then you probably would want to do proper sampling/extrapolation to get a statistically reliable approximation of the final result.

          Additionally, the "do your best" approach can implemented on top of the "truncated" approach by removing the ORDER BY from the query and doing sorting client side. If a server-side solution was desired, one could write a servlet filter that intercepted queries with a top level ORDER BY, modified the algebra to remove it, performed the query, and then sorted the results in the servlet filter.

          [1]
          select * where
          {
          {
          select ?s ?p ?o where

          { ?s ?p ?o . }

          order by ?s
          }
          ?x ?y ? z .
          }

          Show
          sallen Stephen Allen added a comment - Here are the concerns I have with the "do your best" approach: 1) Subqueries: If cancellation is aborted when sorts are encountered, in some cases you can't effectively cancel queries with subqueries (see [1] for a pathological case where if canceled towards the end of the subquery execution would still take a long time to complete) 2) If query cancellation is performed out of the bounds of the client program (say a server-side timer, or through the Joseki web interface), then it would be impossible to know if any particular query returned the correct and full results without performing some (non-SPARQL) call back to the server to check. If, instead, the query results XML were truncated then you could handle it the same as if the connection were broken because at least the closing </sparql> tag would be missing. 3) I feel like the predominate use case is to terminate run-away queries with the expectation that you don't need partial results. That is to say the regular expectation of your system is to retrieve full and final results. I assert this because if you wanted to build a system that, in the course of it's regular execution, used results that were based on a subset of the population, then you probably would want to do proper sampling/extrapolation to get a statistically reliable approximation of the final result. Additionally, the "do your best" approach can implemented on top of the "truncated" approach by removing the ORDER BY from the query and doing sorting client side. If a server-side solution was desired, one could write a servlet filter that intercepted queries with a top level ORDER BY, modified the algebra to remove it, performed the query, and then sorted the results in the servlet filter. [1] select * where { { select ?s ?p ?o where { ?s ?p ?o . } order by ?s } ?x ?y ? z . }
          Hide
          shelsen Simon Helsen added a comment -

          first, let me generally say that it is not because in some cases the "do your best" approach does not work well, you should discard the entire approach. In some cases, it does work well, and those cases are based on real use cases and that is what matters to us. So, I would really appreciate if you take the academic approach out of this since it won't help and is irrelevant.

          Concrete:

          1) subqueries: I think, as discussed in JENA-48, if a subquery is sorting, it should give up right away. It should be simple to figure out if a query is a subquery and enrich the logic to behave differently there.
          2) This I don't get at all. If the QueryExecutionBase keeps track of whether a query was requested to cancel, any outside client program can request that information, that is, when partial results are desired. If NO partial results are desired, don't use the cancel() the way I provided it but use abort() or close(). What is problematic here?
          3) again, same as above. If you don't care about partial results, then don't use cancel() ! Instead use abort() or close(). And if you want to enrich abort() in a thread-safe manner, be my guest

          Stephen, again, you keep questioning our use-cases because it does not fit in a generic theoretical understanding that it works well in all cases. If it works well in the relevant cases, I don't see why it cannot be there. And I agree that it is useful to have a version which just aborts, perhaps throws an exception or whatever, but when I provided the cancel() patch, it was not designed to do this.

          Perhaps we could agree that we keep cancel() the way I suggested with small improvements in the style of JENA-48 to take care of special subcases and at the same time provide a thread-safe way to stop query execution (perhaps using the same 2-stage technique I introduced with cancel()) to abort a query. I think abort() could easily be adapted to that. As long as the javadoc is clear about the difference between cancel() /** I try to give you partial results as good as I can and stop the query as soon as I can / and abort() /* I will stop asap and throw an exception indicating I aborted */

          But I have to insist that we need and use the "do your best" approach the way I introduced it and that it works well in the cases our clients are using. If someone expects unreasonable outcomes, they can be pointed out that the contract does not guarantee this unreasonable expectation. And a top-level sort is not unreasonable IMO

          Could this be a way to resolve the difference of opinion? I.e. provide 2 ways to stop a query with different behavior?

          Show
          shelsen Simon Helsen added a comment - first, let me generally say that it is not because in some cases the "do your best" approach does not work well, you should discard the entire approach. In some cases, it does work well, and those cases are based on real use cases and that is what matters to us. So, I would really appreciate if you take the academic approach out of this since it won't help and is irrelevant. Concrete: 1) subqueries: I think, as discussed in JENA-48 , if a subquery is sorting, it should give up right away. It should be simple to figure out if a query is a subquery and enrich the logic to behave differently there. 2) This I don't get at all. If the QueryExecutionBase keeps track of whether a query was requested to cancel, any outside client program can request that information, that is, when partial results are desired. If NO partial results are desired, don't use the cancel() the way I provided it but use abort() or close(). What is problematic here? 3) again, same as above. If you don't care about partial results, then don't use cancel() ! Instead use abort() or close(). And if you want to enrich abort() in a thread-safe manner, be my guest Stephen, again, you keep questioning our use-cases because it does not fit in a generic theoretical understanding that it works well in all cases. If it works well in the relevant cases, I don't see why it cannot be there. And I agree that it is useful to have a version which just aborts, perhaps throws an exception or whatever, but when I provided the cancel() patch, it was not designed to do this. Perhaps we could agree that we keep cancel() the way I suggested with small improvements in the style of JENA-48 to take care of special subcases and at the same time provide a thread-safe way to stop query execution (perhaps using the same 2-stage technique I introduced with cancel()) to abort a query. I think abort() could easily be adapted to that. As long as the javadoc is clear about the difference between cancel() /** I try to give you partial results as good as I can and stop the query as soon as I can / and abort() / * I will stop asap and throw an exception indicating I aborted */ But I have to insist that we need and use the "do your best" approach the way I introduced it and that it works well in the cases our clients are using. If someone expects unreasonable outcomes, they can be pointed out that the contract does not guarantee this unreasonable expectation. And a top-level sort is not unreasonable IMO Could this be a way to resolve the difference of opinion? I.e. provide 2 ways to stop a query with different behavior?
          Hide
          castagna Paolo Castagna added a comment - - edited

          We (@Talis) run a few public SPARQL end-points and we want to protect our machines from people running very expensive queries. Cancelling or timing out a long running is therefore very useful to us and we currently do it using a separate thread via a Callable and an ExecutorService which gives us a Future object which we use it to set a timeout calling get(long timeout, TimeUnit unit).

          If one of the QueryExecution.execX() methods fail to return within the timeout, we send back an HTTP 500 error code and call QueryExecution.abort(). Once we start streaming back results (i.e. an HTTP 200 status code has been sent to the client) we never timeout or
          interrupt.

          Typically, we see timeouts for large CONSTRUCT or DESCRIBE queries. It's rare we timeout a SELECT query, unless it has a large sort (i.e. ORDER BY). In this case, we timeout. However, even if we send the HTTP 500 error code back to the client, the thread running the sort will continue until completion (wasting RAM and resources). This is where we would greatly benefit from JENA-29 or JENA-29+JENA-44.

          We do not want or need to send partial results to people. We expect this to be very confusing and even difficult to explain to users. They might fall into the trap that if they are searching the 20 largest cities in Europe and we send them only 10, those 10 are the first 10 largest cities in Europe while in reality they might not even be in the first 20.

          Being able to set a timeout directy in ARQ seems useful and it potentially removes the need for a separate thread. However, what the timeout exactly represent? Is it the time to get to the first result? Or what? If it's not the time to get to the first result, will it be possible to cancel/reset a timeout once it has been set and the query execution has started?

          Finally, a few (personal) comments:

          @Andy: "Timing out a query is (I think) going to be the #1 use case."

          It is for us at Talis.

          @Andy: "It might as well be possible to set/reset during query execution."

          Indeed. We would like to be able to foget about the timeout once we start streaming back a large result set.

          @Andy: "We'll need to timeout on sorting and grouping as well, and maybe any materializing iterators."

          Yes.

          @Andy: 3/ Buffer - get all the results before sending any, then the HTTP status code can be set. (3) is ideal functionally but looses streaming.

          Yep. Ideal functionally, but not in practice with limited resources and multiple concurrent SPARQL queries.

          @Stephen "I argue that there should be no expectation of returning any results that might be "queued up" in an iterator after cancellation has been requested."

          I tend to agree, since it is unclear to me what exactly are the use-cases Simon referred to.

          Perhaps, the way to move forward on this is to split this issue into two: one is about timing out queries and the other one is about delivering partial results when a query times out.

          Show
          castagna Paolo Castagna added a comment - - edited We (@Talis) run a few public SPARQL end-points and we want to protect our machines from people running very expensive queries. Cancelling or timing out a long running is therefore very useful to us and we currently do it using a separate thread via a Callable and an ExecutorService which gives us a Future object which we use it to set a timeout calling get(long timeout, TimeUnit unit). If one of the QueryExecution.execX() methods fail to return within the timeout, we send back an HTTP 500 error code and call QueryExecution.abort(). Once we start streaming back results (i.e. an HTTP 200 status code has been sent to the client) we never timeout or interrupt. Typically, we see timeouts for large CONSTRUCT or DESCRIBE queries. It's rare we timeout a SELECT query, unless it has a large sort (i.e. ORDER BY). In this case, we timeout. However, even if we send the HTTP 500 error code back to the client, the thread running the sort will continue until completion (wasting RAM and resources). This is where we would greatly benefit from JENA-29 or JENA-29 + JENA-44 . We do not want or need to send partial results to people. We expect this to be very confusing and even difficult to explain to users. They might fall into the trap that if they are searching the 20 largest cities in Europe and we send them only 10, those 10 are the first 10 largest cities in Europe while in reality they might not even be in the first 20. Being able to set a timeout directy in ARQ seems useful and it potentially removes the need for a separate thread. However, what the timeout exactly represent? Is it the time to get to the first result? Or what? If it's not the time to get to the first result, will it be possible to cancel/reset a timeout once it has been set and the query execution has started? Finally, a few (personal) comments: @Andy: "Timing out a query is (I think) going to be the #1 use case." It is for us at Talis. @Andy: "It might as well be possible to set/reset during query execution." Indeed. We would like to be able to foget about the timeout once we start streaming back a large result set. @Andy: "We'll need to timeout on sorting and grouping as well, and maybe any materializing iterators." Yes. @Andy: 3/ Buffer - get all the results before sending any, then the HTTP status code can be set. (3) is ideal functionally but looses streaming. Yep. Ideal functionally, but not in practice with limited resources and multiple concurrent SPARQL queries. @Stephen "I argue that there should be no expectation of returning any results that might be "queued up" in an iterator after cancellation has been requested." I tend to agree, since it is unclear to me what exactly are the use-cases Simon referred to. Perhaps, the way to move forward on this is to split this issue into two: one is about timing out queries and the other one is about delivering partial results when a query times out.
          Hide
          shelsen Simon Helsen added a comment -

          I already mentioned before that we do not need the support for timeout inside ARQ because we have a sophisticated monitoring thread which manages timeouts among additional sophisticated query scheduling logic. If you decide to add timeout support in ARQ itself, that is fine with us as long as we can make sure it can be deactivated and hopefully does not take up more resources (including threads) when not desirable

          As for use-cases: of course, in your example you don't want partial results because you are really looking for the "largest", but there are many sorting queries where that is not the case. E.g. if you want to query for all users in a project alphabetically, then a partial result set is still better than no results if the query for whatever reason takes too long. The result on the screen would have to indicate that the results are incomplete but possibly the user already has sufficient information to go ahead. There are endless examples like this.

          Again, I do not understand why there is so much resistance

          If all of the people commenting here feel that an abort on timeout is all they need, then that is fine with me, but can we (IBM) need more and so I would like to keep my version of cancel() with best-effort partial results

          Show
          shelsen Simon Helsen added a comment - I already mentioned before that we do not need the support for timeout inside ARQ because we have a sophisticated monitoring thread which manages timeouts among additional sophisticated query scheduling logic. If you decide to add timeout support in ARQ itself, that is fine with us as long as we can make sure it can be deactivated and hopefully does not take up more resources (including threads) when not desirable As for use-cases: of course, in your example you don't want partial results because you are really looking for the "largest", but there are many sorting queries where that is not the case. E.g. if you want to query for all users in a project alphabetically, then a partial result set is still better than no results if the query for whatever reason takes too long. The result on the screen would have to indicate that the results are incomplete but possibly the user already has sufficient information to go ahead. There are endless examples like this. Again, I do not understand why there is so much resistance If all of the people commenting here feel that an abort on timeout is all they need, then that is fine with me, but can we (IBM) need more and so I would like to keep my version of cancel() with best-effort partial results
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Re: 2) The use case that several groups have is to set an upper cost bound on a query issued by SPARQL protocol through Fuseki or
          Joseki. No server-side application specific behaviour. In fact, if the application can be involved, the scope for control is greater. In the case inside Fuseki, there is no application specifc code and the default behaviour should be most appropriate for multiple use cases.

          Suppose there are 100 countries in a database with their populations. The query is to find the top 10 most populous countries, that is ORDER BY + LIMIT 10. If the sort engine has seen 20 countries and is stopped, 10 results appear in the "do your best" case. But if China isn't in the result, then it is going to be a surprise.

          But note also the application maybe doing "top 10", "next 10" in pages in the application logic - ORDER BY and no LIMIT. Simon - you described the contract for sort as "in either S or TS, it holds that s1<=s2 according to the given ordering criteria" which is true. But the contract goes further, item one is the max or min, the first ten are the top 10.

          Choices I see are:
          1/ The cancel call can take an optional argument giving the style.
          2/ The hasNext/next indicates "end" but can be continued to get remaining stuff.
          3/ It's a configuration setting of the execution.

          Of the these, 1 or 3 tells the cancellation mechanism what is required of it in doign the cancellation, not after the cancellation as happen in 2. I prefer 3 as it's the earliest point the execution plan can be informed and there's a mechanism that's already in place (e.g. union query).

          Simon - your clients can get the behaviour you currently provide them by setting the context for the execution.

          Show
          andy.seaborne Andy Seaborne added a comment - Re: 2) The use case that several groups have is to set an upper cost bound on a query issued by SPARQL protocol through Fuseki or Joseki. No server-side application specific behaviour. In fact, if the application can be involved, the scope for control is greater. In the case inside Fuseki, there is no application specifc code and the default behaviour should be most appropriate for multiple use cases. Suppose there are 100 countries in a database with their populations. The query is to find the top 10 most populous countries, that is ORDER BY + LIMIT 10. If the sort engine has seen 20 countries and is stopped, 10 results appear in the "do your best" case. But if China isn't in the result, then it is going to be a surprise. But note also the application maybe doing "top 10", "next 10" in pages in the application logic - ORDER BY and no LIMIT. Simon - you described the contract for sort as "in either S or TS, it holds that s1<=s2 according to the given ordering criteria" which is true. But the contract goes further, item one is the max or min, the first ten are the top 10. Choices I see are: 1/ The cancel call can take an optional argument giving the style. 2/ The hasNext/next indicates "end" but can be continued to get remaining stuff. 3/ It's a configuration setting of the execution. Of the these, 1 or 3 tells the cancellation mechanism what is required of it in doign the cancellation, not after the cancellation as happen in 2. I prefer 3 as it's the earliest point the execution plan can be informed and there's a mechanism that's already in place (e.g. union query). Simon - your clients can get the behaviour you currently provide them by setting the context for the execution.
          Hide
          shelsen Simon Helsen added a comment -

          Thanks Andy. Yes, I am fine with wither 1/ or 3/ as well.

          Show
          shelsen Simon Helsen added a comment - Thanks Andy. Yes, I am fine with wither 1/ or 3/ as well.
          Hide
          shelsen Simon Helsen added a comment -

          Let me add something here: our "clients" are application programs which communicate with us via REST apis. These client programs know the nature of their queries, i.e. they are never written by end-users. Some of these application programs also permit end-users to write queries (not in SPARQL though) and those are then translated into SPARQL. But either way, these client programs know whether partial results make sense or not. I find it interesting that both Andy and Paolo come up with examples where obviously partial results don't make sense even though there are endless examples where it does make sense.

          Show
          shelsen Simon Helsen added a comment - Let me add something here: our "clients" are application programs which communicate with us via REST apis. These client programs know the nature of their queries, i.e. they are never written by end-users. Some of these application programs also permit end-users to write queries (not in SPARQL though) and those are then translated into SPARQL. But either way, these client programs know whether partial results make sense or not. I find it interesting that both Andy and Paolo come up with examples where obviously partial results don't make sense even though there are endless examples where it does make sense.
          Hide
          castagna Paolo Castagna added a comment -

          I am still a bit confused about the notion of "timeout". Is it about the time to get to the first solution or is it about the time to iterate through all the solutions? These are two different timeouts, are we going to support them both or just one? Which one? I think this difference makes sense only for SELECT queries, not with CONSTRUCT or DESCRIBE. Since, both CONSTRUCT and DESCRIBE buffer all results in memory before returning from .execX() method. Correct?

          Will it be possible to change a timeout once it has been set?

          Use case: you set a 10 seconds timeout, you get the first result for a SELECT query before the timeout kicks in therefore you send HTTP 200 status code to your client and you start streaming results. However, you need more than 10 seconds to stream all the results back to the client and you do not want to have your transfer interrupted after 10 seconds.

          Show
          castagna Paolo Castagna added a comment - I am still a bit confused about the notion of "timeout". Is it about the time to get to the first solution or is it about the time to iterate through all the solutions? These are two different timeouts, are we going to support them both or just one? Which one? I think this difference makes sense only for SELECT queries, not with CONSTRUCT or DESCRIBE. Since, both CONSTRUCT and DESCRIBE buffer all results in memory before returning from .execX() method. Correct? Will it be possible to change a timeout once it has been set? Use case: you set a 10 seconds timeout, you get the first result for a SELECT query before the timeout kicks in therefore you send HTTP 200 status code to your client and you start streaming results. However, you need more than 10 seconds to stream all the results back to the client and you do not want to have your transfer interrupted after 10 seconds.
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Paolo - what do you propose?

          Show
          andy.seaborne Andy Seaborne added a comment - Paolo - what do you propose?
          Hide
          andy.seaborne Andy Seaborne added a comment -

          This is an outline of the contract of "cancel". It is not a description of implemenation.

          Currently, QueryExecutionBase.cancel() exists (and is deprecated with the comment "do not use") for testing all this. It will be moved to .abort() when we're ready. A phase of renaming internal methods may happen when the details of implementation make the exact nature of methods and fields quite clear.

          1/ To terminate an execution, something calls .cancel, on the QueryExecution which in turn calls ".cancelRequest()".

          Multiple calls of .cancel result in one call of cancelRequest().

          cancelRequest is async to iterator execution.

          2/ Internal "cancellation" is also possible i.e. the system chooses to call .cancel itself (e.g. timeout, if done that way, or limit on total number of resutls [not planned], other mysteries).

          2/ Cancellation is not required to happen immediately, or indeed to happen at all, but would probably be considered a bad implementation not to do something. That is, we don't enforce-by-contract that cancel has a specific effect at any specific point.

          3/ When cancelled, .hasNext/.next on the results iterator are undefined.

          CONSTRUCT and DESCRIBE will return null.

          4/ There are some internal flags to control the behaviour after cancellation is active.

          Default behaviour:

          A/ Any calls to .hasNext()/.next() throw QueryTerminatedException. They start doing so at some point (cancellation is async to execution) but the intention is as soon as reasonably possible.

          I read the javadoc for the .hasNext contract as meaning is hasNext is true, then noSuchElementException will not be thrown. You can get ConcurrentModificationException from java collections from .next() anyway regardless of .hasNext().

          Continuation behaviour:

          B/ The QueryIterator is closed during the next call to .next(). An element is returned. The iterator is not explicitly closed by
          QueryIteratorBase if NoSuchElementException is thrown. Further calls to .hasNext/.next may return results.

          (a little tighter would be to stop if .hasNext has not been called yet for the next solution - needs another flag for "is hasNext() already decided").

          Show
          andy.seaborne Andy Seaborne added a comment - This is an outline of the contract of "cancel". It is not a description of implemenation. Currently, QueryExecutionBase.cancel() exists (and is deprecated with the comment "do not use") for testing all this. It will be moved to .abort() when we're ready. A phase of renaming internal methods may happen when the details of implementation make the exact nature of methods and fields quite clear. 1/ To terminate an execution, something calls .cancel, on the QueryExecution which in turn calls ".cancelRequest()". Multiple calls of .cancel result in one call of cancelRequest(). cancelRequest is async to iterator execution. 2/ Internal "cancellation" is also possible i.e. the system chooses to call .cancel itself (e.g. timeout, if done that way, or limit on total number of resutls [not planned] , other mysteries). 2/ Cancellation is not required to happen immediately, or indeed to happen at all, but would probably be considered a bad implementation not to do something. That is, we don't enforce-by-contract that cancel has a specific effect at any specific point. 3/ When cancelled, .hasNext/.next on the results iterator are undefined. CONSTRUCT and DESCRIBE will return null. 4/ There are some internal flags to control the behaviour after cancellation is active. Default behaviour: A/ Any calls to .hasNext()/.next() throw QueryTerminatedException. They start doing so at some point (cancellation is async to execution) but the intention is as soon as reasonably possible. I read the javadoc for the .hasNext contract as meaning is hasNext is true, then noSuchElementException will not be thrown. You can get ConcurrentModificationException from java collections from .next() anyway regardless of .hasNext(). Continuation behaviour: B/ The QueryIterator is closed during the next call to .next(). An element is returned. The iterator is not explicitly closed by QueryIteratorBase if NoSuchElementException is thrown. Further calls to .hasNext/.next may return results. (a little tighter would be to stop if .hasNext has not been called yet for the next solution - needs another flag for "is hasNext() already decided").
          Hide
          shelsen Simon Helsen added a comment -

          I don't understand the "Further calls to .hasNext/.next may return results" in the continuation behavior. Also, why do you call it "continuation" behavior? What is continued? In the continuation behavior, are construct and describe also returning null?

          Show
          shelsen Simon Helsen added a comment - I don't understand the "Further calls to .hasNext/.next may return results" in the continuation behavior. Also, why do you call it "continuation" behavior? What is continued? In the continuation behavior, are construct and describe also returning null?
          Hide
          andy.seaborne Andy Seaborne added a comment -

          The yielding of results via hasNext/next is being continued after cancel.

          You want sort to return one or more results even after the cancel. This happens because QueryIterPlainWrapper does nothing on cancel so all that happens is QueryIterSort propagates the cancel, gets no more results from the underlying iterator, sort what it has and continues to response to hasNext/next on these partial results.

          Your patch has (in QueryIteratorBase):

          if (cancelled) {
          close()

          which would limit the results to one except all this is bypassed by QueryIterAbortCancellationRequestException and requestingCancel is not set.

          Show
          andy.seaborne Andy Seaborne added a comment - The yielding of results via hasNext/next is being continued after cancel. You want sort to return one or more results even after the cancel. This happens because QueryIterPlainWrapper does nothing on cancel so all that happens is QueryIterSort propagates the cancel, gets no more results from the underlying iterator, sort what it has and continues to response to hasNext/next on these partial results. Your patch has (in QueryIteratorBase): if (cancelled) { close() which would limit the results to one except all this is bypassed by QueryIterAbortCancellationRequestException and requestingCancel is not set.
          Hide
          shelsen Simon Helsen added a comment -

          thanks for the clarification

          Show
          shelsen Simon Helsen added a comment - thanks for the clarification
          Hide
          shelsen Simon Helsen added a comment -

          Paolo's comment illustrates once you permit timeouts, you need to buffer query results before streaming them to clients. In many web containers, you cannot change the status code once the first byte is written beyond the internal HttpResponse buffer. In our environment, buffering query results is not a problem, but perhaps that is not the case for Paolo.

          Show
          shelsen Simon Helsen added a comment - Paolo's comment illustrates once you permit timeouts, you need to buffer query results before streaming them to clients. In many web containers, you cannot change the status code once the first byte is written beyond the internal HttpResponse buffer. In our environment, buffering query results is not a problem, but perhaps that is not the case for Paolo.
          Hide
          castagna Paolo Castagna added a comment -

          > Paolo - what do you propose?

          A timeout to get to the first result would satisfy our use cases, no results are returned. This allows to send correct HTTP response codes as well (i.e. 200 and deliver all results or 500 and no results).
          An (optional, for us) timeout to iterate though all the solutions would support Simon's use case for partial results.
          So, unless we can reset/cancel a timeout, two timeouts are necessary if we want to support all the use cases described in previous comments.
          Describing timeouts in terms of: time to get to the first result and time to get to the last result or iterate though all the results is, in my opinion, important to avoid confusion.

          Show
          castagna Paolo Castagna added a comment - > Paolo - what do you propose? A timeout to get to the first result would satisfy our use cases, no results are returned. This allows to send correct HTTP response codes as well (i.e. 200 and deliver all results or 500 and no results). An (optional, for us) timeout to iterate though all the solutions would support Simon's use case for partial results. So, unless we can reset/cancel a timeout, two timeouts are necessary if we want to support all the use cases described in previous comments. Describing timeouts in terms of: time to get to the first result and time to get to the last result or iterate though all the results is, in my opinion, important to avoid confusion.
          Hide
          castagna Paolo Castagna added a comment -

          @Simon More precisely, once you permit a timeout to iterate though all the results to be set, then if you want to send correct HTTP status codes/errors (i.e. 200, 206 or 500) you need to buffer query results for exactly the reason you said. If the timeout is the time to get to the first result, buffering query results is not necessary and indeed we try to do streaming as much as we can. In this case, we never send a 206.

          Show
          castagna Paolo Castagna added a comment - @Simon More precisely, once you permit a timeout to iterate though all the results to be set, then if you want to send correct HTTP status codes/errors (i.e. 200, 206 or 500) you need to buffer query results for exactly the reason you said. If the timeout is the time to get to the first result, buffering query results is not necessary and indeed we try to do streaming as much as we can. In this case, we never send a 206.
          Hide
          beobal Sam Tunnicliffe added a comment -

          Just to add a little to Paolo's comments, for our (Talis) use case the query form is also relevant to the timeout behaviour. For SELECT queries, we want to be able to timeout up to the point at which the first result is returned. If we hit a timeout before this point, we return a 500 status and no results with our HTTP response. After the first result, the rest of the query execution is cheap (relatively) so we return a 200 response and stream the complete result set out. There is a key difference in the implementation of CONSTRUCT & DESCRIBE queries though. In these, a ResultSet is used internally, this is iterated over completely and its bindings are used to populate an in memory model representing the ultimate answer. Andy will correct me if I'm wrong here, but this iterator is completely exhausted, and the result model fully constructed before execXXX() returns. In this case, we do care about timing out the (inner) ResultSet's iteration even after it has reached its first result as it can still be a considerable expense to build that model. Basically,

          SELECT * where {?S ?P ?O}

          is a lot cheaper than

          CONSTRUCT {?S ?P ?O} WHERE {?S ?P ?O}

          Show
          beobal Sam Tunnicliffe added a comment - Just to add a little to Paolo's comments, for our (Talis) use case the query form is also relevant to the timeout behaviour. For SELECT queries, we want to be able to timeout up to the point at which the first result is returned. If we hit a timeout before this point, we return a 500 status and no results with our HTTP response. After the first result, the rest of the query execution is cheap (relatively) so we return a 200 response and stream the complete result set out. There is a key difference in the implementation of CONSTRUCT & DESCRIBE queries though. In these, a ResultSet is used internally, this is iterated over completely and its bindings are used to populate an in memory model representing the ultimate answer. Andy will correct me if I'm wrong here, but this iterator is completely exhausted, and the result model fully constructed before execXXX() returns. In this case, we do care about timing out the (inner) ResultSet's iteration even after it has reached its first result as it can still be a considerable expense to build that model. Basically, SELECT * where {?S ?P ?O} is a lot cheaper than CONSTRUCT {?S ?P ?O} WHERE {?S ?P ?O}
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Simon - exactly - that's a feature of the result delivery and not the ARQ engine itself. You return 206 - you must be buffering somewhere to know to return 206 or 200. Joseki buffers exactly for this reason to get the HTTP status code right. Fuseki does not yet; it will.

          I was expecting resettable timeouts if possible, ideally both sync and async. Sync only makes sense for SELECT.

          My thinking was that if the query timeout, then no result from CONSTRUCT or DESCRIBE would be returned. A partial model is possible using the "drain intermediate results" mechanism for Simon with the same considerations for different answers to a completing query.

          Show
          andy.seaborne Andy Seaborne added a comment - Simon - exactly - that's a feature of the result delivery and not the ARQ engine itself. You return 206 - you must be buffering somewhere to know to return 206 or 200. Joseki buffers exactly for this reason to get the HTTP status code right. Fuseki does not yet; it will. I was expecting resettable timeouts if possible, ideally both sync and async. Sync only makes sense for SELECT. My thinking was that if the query timeout, then no result from CONSTRUCT or DESCRIBE would be returned. A partial model is possible using the "drain intermediate results" mechanism for Simon with the same considerations for different answers to a completing query.
          Hide
          sallen Stephen Allen added a comment -

          Andy - I think your design makes sense.

          The two use cases I would be interested in are manual cancellation and a timeout if a query is still running after certain period of time (regardless if it has started returning results yet), what Paolo termed "the time to iterate though all the results".

          You mention that CONSTRUCT and DESCRIBE will return null when canceled because they currently put all triples in an in-memory model and then return those results. However, this may not always be the case as the operations could theoretically be implemented in a streaming manner with a distinct iterator. Do we need to plan for this?

          And finally, I can see wanting to cancel update queries (e.g. the system is about to shut down so it is better to terminate cleanly rather than leave the data files in a corrupt state). Since we don't have transaction support, we should probably make sure not to associate timeouts with update queries though.

          Show
          sallen Stephen Allen added a comment - Andy - I think your design makes sense. The two use cases I would be interested in are manual cancellation and a timeout if a query is still running after certain period of time (regardless if it has started returning results yet), what Paolo termed "the time to iterate though all the results". You mention that CONSTRUCT and DESCRIBE will return null when canceled because they currently put all triples in an in-memory model and then return those results. However, this may not always be the case as the operations could theoretically be implemented in a streaming manner with a distinct iterator. Do we need to plan for this? And finally, I can see wanting to cancel update queries (e.g. the system is about to shut down so it is better to terminate cleanly rather than leave the data files in a corrupt state). Since we don't have transaction support, we should probably make sure not to associate timeouts with update queries though.
          Hide
          shelsen Simon Helsen added a comment -

          tiny (but relevant) bug fix

          Show
          shelsen Simon Helsen added a comment - tiny (but relevant) bug fix
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Simon - in order to incorporate both styles of cancellation behaviour, and because of the nested group/sort issue, using an exception isn't workable. Instead, group/sort need to implement requestCancel(), not override cancel(). group/sort can close(), not cancel() their underlying iterator and so the sort is only over results seen so far. They continue to hasNext()/next() as usual.

          QueryIteratorBase has a flag as to whether to throw an exception in hasNext()/next() . When the policy is "drain", not "stop immediately" the flag is not set. Default behaviour is to throw an exception - "drain" requires some special setting.

          Next step - prototype all this.

          Show
          andy.seaborne Andy Seaborne added a comment - Simon - in order to incorporate both styles of cancellation behaviour, and because of the nested group/sort issue, using an exception isn't workable. Instead, group/sort need to implement requestCancel(), not override cancel(). group/sort can close(), not cancel() their underlying iterator and so the sort is only over results seen so far. They continue to hasNext()/next() as usual. QueryIteratorBase has a flag as to whether to throw an exception in hasNext()/next() . When the policy is "drain", not "stop immediately" the flag is not set. Default behaviour is to throw an exception - "drain" requires some special setting. Next step - prototype all this.
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Stephen - reasonably behaviour possibility for CONSTRUCT and DESCRIBE. I don't see any difficulties. At the moment, it's all in QueryExecution and it specifies a Model but, especially for the forms that pass the model in, some partial results make sense.

          Updates will need very careful handling. The safest is to materialize the results, then do the changes but that only protects one operation, and a request can be multiple operations.

          Show
          andy.seaborne Andy Seaborne added a comment - Stephen - reasonably behaviour possibility for CONSTRUCT and DESCRIBE. I don't see any difficulties. At the moment, it's all in QueryExecution and it specifies a Model but, especially for the forms that pass the model in, some partial results make sense. Updates will need very careful handling. The safest is to materialize the results, then do the changes but that only protects one operation, and a request can be multiple operations.
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Something now checked in at SF that allows .cancel to be called on a query execution.

          Simon:
          I changed QueryIterSort/Group to do a requestCancel no tuse an exception (see discussion about nested queries).

          QueryIteratorBase (and only QueryIteratorBase) has cancelAllowContinue()

          Normal cancel will set he flags to cause hash?next to throw an exception after cancel is called (.close() is called sync to the execution thread).

          cancelAllowContinue does not set the flags, leaving the iterator to continue working.

          We just have to work out how a group or sort knows what mode to handle request cancel is in. Looking in the ExecutionContext seems best to me, rather than an argument to requestCancel chooen by cancelAllowContinue. By using the execution context, it can be chosen more dynamically.

          Show
          andy.seaborne Andy Seaborne added a comment - Something now checked in at SF that allows .cancel to be called on a query execution. Simon: I changed QueryIterSort/Group to do a requestCancel no tuse an exception (see discussion about nested queries). QueryIteratorBase (and only QueryIteratorBase) has cancelAllowContinue() Normal cancel will set he flags to cause hash?next to throw an exception after cancel is called (.close() is called sync to the execution thread). cancelAllowContinue does not set the flags, leaving the iterator to continue working. We just have to work out how a group or sort knows what mode to handle request cancel is in. Looking in the ExecutionContext seems best to me, rather than an argument to requestCancel chooen by cancelAllowContinue. By using the execution context, it can be chosen more dynamically.
          Hide
          shelsen Simon Helsen added a comment -

          Andy, sounds ok to me

          Show
          shelsen Simon Helsen added a comment - Andy, sounds ok to me
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Leaving sub-task for QueryEngineHTTP open

          Show
          andy.seaborne Andy Seaborne added a comment - Leaving sub-task for QueryEngineHTTP open
          Hide
          andy.seaborne Andy Seaborne added a comment -

          See release ARQ 2.8.8

          Show
          andy.seaborne Andy Seaborne added a comment - See release ARQ 2.8.8
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Reopen to set fix version.

          Show
          andy.seaborne Andy Seaborne added a comment - Reopen to set fix version.
          Hide
          andy.seaborne Andy Seaborne added a comment -

          Set fix version to clear up unmarked ones.

          Show
          andy.seaborne Andy Seaborne added a comment - Set fix version to clear up unmarked ones.

            People

            • Assignee:
              andy.seaborne Andy Seaborne
              Reporter:
              shelsen Simon Helsen
            • Votes:
              5 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development