Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-932

Add ability to cancel script execution associated with a Gremlin Server Session

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 3.0.2-incubating
    • Fix Version/s: 3.2.4
    • Component/s: server
    • Labels:
      None

      Description

      Currently with a SessionedClient there is no way to cancel a long running script and the client has to depend on Gremlin Server side configured timeouts before they can execute another script associated with the same session id.

      There is a way we can forcefully close a session from the client side, or just close the entire Gremlin client. But it would be useful for client side applications to be able to cancel script execution, have its intermediate effects rolled back, and be able to continue interacting with the session without losing session variable state maintained on the Gremlin server side.

      Unsure where this should live at an API level, since canceling by session id isn't relevant for all Client implementations. If somehow when the CompletableFuture<ResultSet> returned by Client.submitAsync could do this when the Future is canceled, that would be a nice way to bridge implementations.

        Issue Links

          Activity

          Hide
          spmallette stephen mallette added a comment -

          Dylan Millikin what do you think of this one?

          Show
          spmallette stephen mallette added a comment - Dylan Millikin what do you think of this one?
          Hide
          dmill Dylan Millikin added a comment -

          Just as extra info for Zachary Kurey. There is also a way to close a remote session (not just on the client end). This is recent so I thought I would point it out. (http://tinkerpop.incubator.apache.org/docs/3.0.2-SNAPSHOT/#_opprocessors_arguments under SessionOpProcessor) I don't know if gremlin-driver already uses this, I'd have to check.
          Of course this doesn't help much in this particular case.

          In principle it could be a nice feature to have. From an API point of view it would make sense to have this be another key? (like say "cancel" or "terminate" or something). People can use it if they want and others are free to completely ignore it.

          As for some of the tricky parts which kind of turn me off the idea. There's a limit regarding what can be rolled back. Some edge cases I can think of:

          • The goal is to keep session variables and such but what happens to everything set during the "cancelled" script? Like bindings, variables, transaction consumers, etc... and anything global really. I don't think there's a way to reliably bring the state back to what it was before the script was executed?
          • If multiple clients connect to a same session. If by any means one client terminates what it considers to be a long running script it could be difficult to debug (from the pov of other clients). Even though this is more of a user responsibility it's still something to keep in mind.

          I somehow think that terminating the session and reopening one client side is the way to go. It implies that the burden of recreating the same session is put on the client but I don't find that so shocking. In that aspect it's kind of like how TP provides some fail-retry strategies but clients have to implement their own anyways.

          Show
          dmill Dylan Millikin added a comment - Just as extra info for Zachary Kurey . There is also a way to close a remote session (not just on the client end). This is recent so I thought I would point it out. ( http://tinkerpop.incubator.apache.org/docs/3.0.2-SNAPSHOT/#_opprocessors_arguments under SessionOpProcessor) I don't know if gremlin-driver already uses this, I'd have to check. Of course this doesn't help much in this particular case. In principle it could be a nice feature to have. From an API point of view it would make sense to have this be another key ? (like say "cancel" or "terminate" or something). People can use it if they want and others are free to completely ignore it. As for some of the tricky parts which kind of turn me off the idea. There's a limit regarding what can be rolled back. Some edge cases I can think of: The goal is to keep session variables and such but what happens to everything set during the "cancelled" script? Like bindings, variables, transaction consumers, etc... and anything global really. I don't think there's a way to reliably bring the state back to what it was before the script was executed? If multiple clients connect to a same session. If by any means one client terminates what it considers to be a long running script it could be difficult to debug (from the pov of other clients). Even though this is more of a user responsibility it's still something to keep in mind. I somehow think that terminating the session and reopening one client side is the way to go. It implies that the burden of recreating the same session is put on the client but I don't find that so shocking. In that aspect it's kind of like how TP provides some fail-retry strategies but clients have to implement their own anyways.
          Hide
          dmill Dylan Millikin added a comment -

          stephen mallette off the top of your head, does close terminate the script execution?

          Show
          dmill Dylan Millikin added a comment - stephen mallette off the top of your head, does close terminate the script execution?
          Hide
          spmallette stephen mallette added a comment -

          > I don't think there's a way to reliably bring the state back to what it was before the script was executed?

          I guess you could store bindings before the script execution outside of the scriptengine??? On cancel you could blow out whatever was in the scriptengine and replace it with the state from the stored bindings...maybe that would work..............the bigger problem is intermediate commits. Someone decides to bust this:

          g.addV()
          g.tx().commit()
          g.V()
          

          If the script cancels after evaluation of the commit() we can't revert that state.....

          > off the top of your head, does close terminate the script execution?

          it looks like it does an orderly shutdown. it would queue the auto-rollbacks behind any remaining scripts. The scripts would either timeout or complete and then auto-rollback and then dead session.

          Show
          spmallette stephen mallette added a comment - > I don't think there's a way to reliably bring the state back to what it was before the script was executed? I guess you could store bindings before the script execution outside of the scriptengine??? On cancel you could blow out whatever was in the scriptengine and replace it with the state from the stored bindings...maybe that would work..............the bigger problem is intermediate commits. Someone decides to bust this: g.addV() g.tx().commit() g.V() If the script cancels after evaluation of the commit() we can't revert that state..... > off the top of your head, does close terminate the script execution? it looks like it does an orderly shutdown. it would queue the auto-rollbacks behind any remaining scripts. The scripts would either timeout or complete and then auto-rollback and then dead session.
          Hide
          dmill Dylan Millikin added a comment - - edited

          Ah you're right I didn't think about that case. I was also wondering if what I had brought up wasn't already the case when an exception is thrown? For instance :

          g.addV()
          g.tx().commit()
          g.V().stepDoesNotExist()
          

          Would result in the same behavior right? This in theory could alleviate the shortcomings of a cancel feature so long as the cancel is treated like an exception? Responsibility falls back on the user.

          Show
          dmill Dylan Millikin added a comment - - edited Ah you're right I didn't think about that case. I was also wondering if what I had brought up wasn't already the case when an exception is thrown? For instance : g.addV() g.tx().commit() g.V().stepDoesNotExist() Would result in the same behavior right? This in theory could alleviate the shortcomings of a cancel feature so long as the cancel is treated like an exception? Responsibility falls back on the user.
          Hide
          spmallette stephen mallette added a comment -

          i believe that's the case. If the exception throws after the commit there's not much that can be done. The best practice is likely to no do mid script commits or to at least do them nicely and expect failure (i.e. wrap up in retry logic or at least try-catch).

          Show
          spmallette stephen mallette added a comment - i believe that's the case. If the exception throws after the commit there's not much that can be done. The best practice is likely to no do mid script commits or to at least do them nicely and expect failure (i.e. wrap up in retry logic or at least try-catch).
          Hide
          dmill Dylan Millikin added a comment - - edited

          Ok, in that case, cancel would be analogous to having a script timeout. We're just forcing the timeout manually.

          If your script times out you know to check the commits that may or may not have been done. Same thing with cancel? In that case how about a forceTimeout feature instead?

          Show
          dmill Dylan Millikin added a comment - - edited Ok, in that case, cancel would be analogous to having a script timeout. We're just forcing the timeout manually. If your script times out you know to check the commits that may or may not have been done. Same thing with cancel? In that case how about a forceTimeout feature instead?
          Hide
          rjbriody Bob Briody added a comment - - edited

          If a client app submits scripts with embedded commits then it seams reasonable to say that they are responsible for determining and handling the state after issuing a cancellation. In most cases the standard behavior should be fine.

          Show
          rjbriody Bob Briody added a comment - - edited If a client app submits scripts with embedded commits then it seams reasonable to say that they are responsible for determining and handling the state after issuing a cancellation. In most cases the standard behavior should be fine.
          Hide
          zackurey Zachary Kurey added a comment -

          +1 to Bob Briody's comment. Seems reasonable they need to figure out the state, but stopping the current executing script is definitely less burdensome then having to potentially redo a lot of work associated with previous script submissions for the same session id.

          As far as the scope of what the user or a client application needs to determine what the current state is. I'm wondering if there is some kind of feedback mechanism that could be added here as well(if it doesn’t already exist), where the client could retrieve the last line of the latest script executed associated with a particular session id? Then at least there is a clear way to know how far the script made it.

          Show
          zackurey Zachary Kurey added a comment - +1 to Bob Briody 's comment. Seems reasonable they need to figure out the state, but stopping the current executing script is definitely less burdensome then having to potentially redo a lot of work associated with previous script submissions for the same session id. As far as the scope of what the user or a client application needs to determine what the current state is. I'm wondering if there is some kind of feedback mechanism that could be added here as well(if it doesn’t already exist), where the client could retrieve the last line of the latest script executed associated with a particular session id? Then at least there is a clear way to know how far the script made it.
          Hide
          spmallette stephen mallette added a comment -

          > where the client could retrieve the last line of the latest script executed associated with a particular session id?

          I don't think such a thing is possible as the script is not executed line-by-line at the level of abstraction Gremlin Server deals with it. uhhhhhhhhh - maybe possible way way way lower in the stack which would constitute an even bigger change.

          Show
          spmallette stephen mallette added a comment - > where the client could retrieve the last line of the latest script executed associated with a particular session id? I don't think such a thing is possible as the script is not executed line-by-line at the level of abstraction Gremlin Server deals with it. uhhhhhhhhh - maybe possible way way way lower in the stack which would constitute an even bigger change.
          Hide
          rjbriody Bob Briody added a comment -

          My $.02, I think that having the ability to cancel at all would be a great first step.

          Show
          rjbriody Bob Briody added a comment - My $.02, I think that having the ability to cancel at all would be a great first step.
          Hide
          zackurey Zachary Kurey added a comment -

          If its non-trivial then I won't push for that additional request right now. Agreed cancel is a good start and this ticket's scope should be limited to that.

          Show
          zackurey Zachary Kurey added a comment - If its non-trivial then I won't push for that additional request right now. Agreed cancel is a good start and this ticket's scope should be limited to that.
          Hide
          dmill Dylan Millikin added a comment -

          Wanted to get the ball rolling on this one again.

          It seems to be the general consensus that the responsibility for the state falls onto the user cencelling. So maybe we can orient the discussion towards what should be covered by the feature:

          Should cancel revert the bindings back to what they were? This option is probably the most practical but it adds another behavior the users will need to be knowledgeable about. In essence it reverts some stuff back but not everything, this could be counter-intuitive.

          Or should it simply force a "timeout" like behavior on the script? In this case the user is responsible for the entire state when they cancel. This has the advantage of being easy to understand since it works like a timeout. On the other hand the bindings could/will be a mess.

          Show
          dmill Dylan Millikin added a comment - Wanted to get the ball rolling on this one again. It seems to be the general consensus that the responsibility for the state falls onto the user cencelling. So maybe we can orient the discussion towards what should be covered by the feature: Should cancel revert the bindings back to what they were? This option is probably the most practical but it adds another behavior the users will need to be knowledgeable about. In essence it reverts some stuff back but not everything, this could be counter-intuitive. Or should it simply force a "timeout" like behavior on the script? In this case the user is responsible for the entire state when they cancel. This has the advantage of being easy to understand since it works like a timeout. On the other hand the bindings could/will be a mess.
          Hide
          rjbriody Bob Briody added a comment -

          I don't think it is very common to have intermediate commit()'s, and if you do, you should understand that. So to optimize for the common use case, I think that rolling bindings back by default is the most intuitive behavior.

          Just to add another option for completeness (which honestly I'm not even a fan of)... It might be possible to specify how bindings are handled when the cancellation is issued. Something like rollbackBindings: true/false.

          Show
          rjbriody Bob Briody added a comment - I don't think it is very common to have intermediate commit()'s, and if you do, you should understand that. So to optimize for the common use case, I think that rolling bindings back by default is the most intuitive behavior. Just to add another option for completeness (which honestly I'm not even a fan of)... It might be possible to specify how bindings are handled when the cancellation is issued. Something like rollbackBindings: true/false .
          Hide
          zackurey Zachary Kurey added a comment -

          I'll just register that I don't have a strong opinion here. Either approach(or both) seems reasonable. I’d expect most users after canceling are going to make some changes to their script and re-run it from the beginning, so the state of the session vars may not matter. I'd hope the script is either idempotent if re-run, or that the user takes steps to re-establish known good state prior to re-executing the script.

          Whats most intuitive to me personally is that it would be treated the same as if I had been manually executing the statements, and just canceled the latest one. Which would mean that I should assume the vars have been affected and that I should ensure everything is in an expected state before re-running.

          Show
          zackurey Zachary Kurey added a comment - I'll just register that I don't have a strong opinion here. Either approach(or both) seems reasonable. I’d expect most users after canceling are going to make some changes to their script and re-run it from the beginning, so the state of the session vars may not matter. I'd hope the script is either idempotent if re-run, or that the user takes steps to re-establish known good state prior to re-executing the script. Whats most intuitive to me personally is that it would be treated the same as if I had been manually executing the statements, and just canceled the latest one. Which would mean that I should assume the vars have been affected and that I should ensure everything is in an expected state before re-running.
          Hide
          spmallette stephen mallette added a comment -

          I was thinking about this one a bit this morning. To summarize all the discussion to this point, we seemed to converge on:

          1. Don't worry about intermediate commits - if users do that, they are on their own.
          2. Not rolling back bindings on a cancel (the more i thought about this today, the less technically feasible it seemed)

          Both of these take a fair bit of complexity off the table. Now we're just talking about stopping a long run script from executing.

          Show
          spmallette stephen mallette added a comment - I was thinking about this one a bit this morning. To summarize all the discussion to this point, we seemed to converge on: 1. Don't worry about intermediate commits - if users do that, they are on their own. 2. Not rolling back bindings on a cancel (the more i thought about this today, the less technically feasible it seemed) Both of these take a fair bit of complexity off the table. Now we're just talking about stopping a long run script from executing.
          Hide
          spmallette stephen mallette added a comment -

          Very doubtful that we'll see this for 3.1.1-incubating. Just not enough time and no clear path for how this will work. Bumping to 3.2.0-incubating.

          Show
          spmallette stephen mallette added a comment - Very doubtful that we'll see this for 3.1.1-incubating. Just not enough time and no clear path for how this will work. Bumping to 3.2.0-incubating.
          Hide
          rjbriody Bob Briody added a comment -

          3.2 isn't that far out now either. Any idea if this ticket will make it?

          Show
          rjbriody Bob Briody added a comment - 3.2 isn't that far out now either. Any idea if this ticket will make it?
          Hide
          spmallette stephen mallette added a comment -

          i doubt it - i still don't have a clue how i'll do it.

          Show
          spmallette stephen mallette added a comment - i doubt it - i still don't have a clue how i'll do it.
          Hide
          spmallette stephen mallette added a comment -

          I sense that barring some obscene change to TinkerPop's transactional model, we won't get much closer to a solution for this than what was done on TINKERPOP-1442. The only additional change left that I think I could do is to implement a "force" option on a close request that would just interrupt the thread running the session and orphan the transactions leaving it up to the graph implementation to sort out what to do with them. In a sense, a "force" might be better than awaiting a timeout (e.g. you have a long/infinite timeout for an OLAP job). Of course, an "interrupt" is merely a suggestion to stop the thread in the session so, that might not even doing anything at the end of the day depending on the script.

          If there are no other ideas on this, I will try to implement a "force" option in the manner I just described. If for some reason that isn't possible then I will just close this barring no new ideas.

          Show
          spmallette stephen mallette added a comment - I sense that barring some obscene change to TinkerPop's transactional model, we won't get much closer to a solution for this than what was done on TINKERPOP-1442 . The only additional change left that I think I could do is to implement a "force" option on a close request that would just interrupt the thread running the session and orphan the transactions leaving it up to the graph implementation to sort out what to do with them. In a sense, a "force" might be better than awaiting a timeout (e.g. you have a long/infinite timeout for an OLAP job). Of course, an "interrupt" is merely a suggestion to stop the thread in the session so, that might not even doing anything at the end of the day depending on the script. If there are no other ideas on this, I will try to implement a "force" option in the manner I just described. If for some reason that isn't possible then I will just close this barring no new ideas.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user spmallette opened a pull request:

          https://github.com/apache/tinkerpop/pull/461

          TINKERPOP-932 Added "force" option on session close.

          https://issues.apache.org/jira/browse/TINKERPOP-932

          Allows session close requests to interrupt a long run job at the cost of not closing transactions.

          Tested with: `mvn clean install && mvn verify -pl gremlin-server -DskipIntegrationTests=false -DincludeNeo4j`

          VOTE +1

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/tinkerpop TINKERPOP-932

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tinkerpop/pull/461.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #461



          Show
          githubbot ASF GitHub Bot added a comment - GitHub user spmallette opened a pull request: https://github.com/apache/tinkerpop/pull/461 TINKERPOP-932 Added "force" option on session close. https://issues.apache.org/jira/browse/TINKERPOP-932 Allows session close requests to interrupt a long run job at the cost of not closing transactions. Tested with: `mvn clean install && mvn verify -pl gremlin-server -DskipIntegrationTests=false -DincludeNeo4j` VOTE +1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-932 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/461.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #461
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dkuppitz commented on the issue:

          https://github.com/apache/tinkerpop/pull/461

          VOTE: +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the issue: https://github.com/apache/tinkerpop/pull/461 VOTE: +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user pluradj commented on the issue:

          https://github.com/apache/tinkerpop/pull/461

          Tested with: `mvn clean install && mvn verify -pl gremlin-server -DskipIntegrationTests=false -DincludeNeo4j`

          Code review LGTM.

          VOTE: +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user pluradj commented on the issue: https://github.com/apache/tinkerpop/pull/461 Tested with: `mvn clean install && mvn verify -pl gremlin-server -DskipIntegrationTests=false -DincludeNeo4j` Code review LGTM. VOTE: +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tinkerpop/pull/461

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/461

            People

            • Assignee:
              spmallette stephen mallette
              Reporter:
              zackurey Zachary Kurey
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development