Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1490

Provider a Future based Traversal.async(Function<Traversal,V>) terminal step

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 3.2.2
    • Fix Version/s: 3.2.4
    • Component/s: process
    • Labels:
      None

      Description

      Matthias Broecheler had the idea of adding a Traversal.async() method. This is important for not only avoiding thread locking on a query in Gremlin, but also, it will allow single threaded language variants like Gremlin-JavaScript to use callbacks for processing query results.

      Future<List<String>> result = g.V().out().values("name").async(Traversal::toList)
      
      Future<List<String>> result = g.V().out().name.async{it.toList()}
      
      g.V().out().values('name').async((err,names) => {
        // I don't know JavaScript, but ...
        return list(names);
      }) 
      

      ...

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tinkerpop/pull/478

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/478
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          VOTE +1.

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/478 VOTE +1.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          Tests pass. I think this is a good base to move forward to a fully functional async API for Gremlin.

          VOTE +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/478 Tests pass. I think this is a good base to move forward to a fully functional async API for Gremlin. VOTE +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          Rebase is complete now - tests are all good with:

          ```text
          $ mvn clean install && mvn verify -pl gremlin-server -DskipIntegrationTests=false -DincludeNeo4j
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/478 Rebase is complete now - tests are all good with: ```text $ mvn clean install && mvn verify -pl gremlin-server -DskipIntegrationTests=false -DincludeNeo4j ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          I'm rebasing on `tp32` to clean up that conflict. this branch was pretty far behind.

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/478 I'm rebasing on `tp32` to clean up that conflict. this branch was pretty far behind.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          Well I implemented this API for gremlin-python. There is one major problem: side effects. Since the current side effect API is designed to be synchronous (calling `run_sync`), it cannot be used inside a coroutine, as demonstrated by this [test](https://github.com/apache/tinkerpop/blob/54fac5146a5ac067e75f2fbfe135fae2da641c35/gremlin-python/src/main/jython/tests/driver/test_driver_remote_connection.py#L210). This is problematic for a Tornado or Asyncio based driver. I think in the future we should consider the possibility that the `promise()` method return a `AsyncSideEffect` object so the end user can leverage the async techniques with side effects.

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/478 Well I implemented this API for gremlin-python. There is one major problem: side effects. Since the current side effect API is designed to be synchronous (calling `run_sync`), it cannot be used inside a coroutine, as demonstrated by this [test] ( https://github.com/apache/tinkerpop/blob/54fac5146a5ac067e75f2fbfe135fae2da641c35/gremlin-python/src/main/jython/tests/driver/test_driver_remote_connection.py#L210 ). This is problematic for a Tornado or Asyncio based driver. I think in the future we should consider the possibility that the `promise()` method return a `AsyncSideEffect` object so the end user can leverage the async techniques with side effects.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          As long as the driver returns a future with a compatible API, yes, it can be used with all Python versions. The snippet you provided will throw an error with an Asyncio or Tornado client, as you can only call [`result()`](https://docs.python.org/3/library/asyncio-task.html#asyncio.Future.result) on a completed future. You will typically need to `yield/yield from/await` it first depending on the Python version and driver (like in the examples provided above). However, the underlying coroutine will spawn independently, which means that even if you don't `yield/yield from/await`, but you wait long enough before calling `result()` it will complete. For example:

          ```python
          future = g.V().promise(lambda t: t.toList())
          assert not future.done() # Future has not completed
          await asyncio.sleep(5) # Simulate doing a ton of other blocking stuff
          assert future.done() # Future has completed (assuming 5 seconds was enough time to complete)
          result = future.result() # Doesn't throw an error
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/478 As long as the driver returns a future with a compatible API, yes, it can be used with all Python versions. The snippet you provided will throw an error with an Asyncio or Tornado client, as you can only call [`result()`] ( https://docs.python.org/3/library/asyncio-task.html#asyncio.Future.result ) on a completed future. You will typically need to `yield/yield from/await` it first depending on the Python version and driver (like in the examples provided above). However, the underlying coroutine will spawn independently, which means that even if you don't `yield/yield from/await`, but you wait long enough before calling `result()` it will complete. For example: ```python future = g.V().promise(lambda t: t.toList()) assert not future.done() # Future has not completed await asyncio.sleep(5) # Simulate doing a ton of other blocking stuff assert future.done() # Future has completed (assuming 5 seconds was enough time to complete) result = future.result() # Doesn't throw an error ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aboudreault commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          Except my initial concern, can you confirm we can use the latest way with all Python versions and with/without a custom driver? It would be nice to have an unified way that just works everywhere. The asyncio + tornado support are good additions.

          ```python
          g.V().promise(lambda x: x.toList()).result()
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user aboudreault commented on the issue: https://github.com/apache/tinkerpop/pull/478 Except my initial concern, can you confirm we can use the latest way with all Python versions and with/without a custom driver? It would be nice to have an unified way that just works everywhere. The asyncio + tornado support are good additions. ```python g.V().promise(lambda x: x.toList()).result() ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          It's been about a week since @davebshow laid out the approach he's recommending here. I assume the silence means that there are no objections to that approach and we have consensus on what to do?

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/478 It's been about a week since @davebshow laid out the approach he's recommending here. I assume the silence means that there are no objections to that approach and we have consensus on what to do?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          I put together a quick example of how this can be implemented in gremlin-python. Obviously the example is incomplete, but hopefully it can help move the discussion forward: https://github.com/apache/tinkerpop/commit/aa85a9b5278c55aa28014aa135c8498428295071

          These changes result in the following APIs depending on which Python future is returned by the driver, but the GLV doesn't care as long as it supports Python's common future API:

          • Tornado w/Python 2.7+ returning a `tornado.concurrent.Future`
            ```python
            @gen.coroutine
            def go():
            vertices = yield g.V().promise(lambda x: x.toList())
            assert len(vertices) == 6
            count = yield g.V().count().promise(lambda x: x.next())
            assert count == 6

          loop.run_sync(go)
          ```

          • Asyncio/Tornado with Python 3.5 async/await syntax returning `asyncio.Future` or `tornado.concurrent.Future`
            ```python
            async def go():
            vertices = await g.V().promise(lambda x: x.toList())
            assert len(vertices) == 6
            count = await g.V().count().promise(lambda x: x.next())
            assert count == 6

          loop.run_until_complete(go())
          ```

          • Driver with Python 2.7+ that returns `concurrent.futures.Futures` (or backport)
            ```python
            def go():
            vertices = g.V().promise(lambda x: x.toList()).result()
            assert len(vertices) == 6
            count = g.V().count().promise(lambda x: x.next()).result()
            assert count == 6
            ```
          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/478 I put together a quick example of how this can be implemented in gremlin-python. Obviously the example is incomplete, but hopefully it can help move the discussion forward: https://github.com/apache/tinkerpop/commit/aa85a9b5278c55aa28014aa135c8498428295071 These changes result in the following APIs depending on which Python future is returned by the driver, but the GLV doesn't care as long as it supports Python's common future API: Tornado w/Python 2.7+ returning a `tornado.concurrent.Future` ```python @gen.coroutine def go(): vertices = yield g.V().promise(lambda x: x.toList()) assert len(vertices) == 6 count = yield g.V().count().promise(lambda x: x.next()) assert count == 6 loop.run_sync(go) ``` Asyncio/Tornado with Python 3.5 async/await syntax returning `asyncio.Future` or `tornado.concurrent.Future` ```python async def go(): vertices = await g.V().promise(lambda x: x.toList()) assert len(vertices) == 6 count = await g.V().count().promise(lambda x: x.next()) assert count == 6 loop.run_until_complete(go()) ``` Driver with Python 2.7+ that returns `concurrent.futures.Futures` (or backport) ```python def go(): vertices = g.V().promise(lambda x: x.toList()).result() assert len(vertices) == 6 count = g.V().count().promise(lambda x: x.next()).result() assert count == 6 ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          I agree with the `RemoteConnection` implementation being pluggable. Therefore, the GLV should be able to use any remote connection as long as it complies with the API. But, since pre-Python3 doesn't have any standard implementation of a `Future`, in order to maintain this "plugablity", IMO the GLV should be able to handle whatever type of `Future` the `RemoteConnection` returns. This places the onus on the end user to understand what `RemoteConnection` implementation they are using and the type of Future returned in order to comply with the syntax required by underlying framework implementation. This does not mean that responses need have a different API, only that developers know how to either `yield` or `yield from` them, or use the `concurrent.futures.as_completed` method.

          The good thing is that the three possible future types (asyncio, tornado, and concurrent) have compatible APIs, and should be able to all be handled by the GLV in the same fashion. However, if I understand correctly (not sure that I do) it seems to me that doing specialized handling, would require us to commit to a certain syntax. Unfortunately, all of these problems really stem from the Python community's unwillingness to drop the legacy Python implementation, which the Python Software Foundation has been urging us to do for years. If I am missing something, or there is a solution that I can't see for some reason, please provide an example of a way that allows any pluggable `RemoteConnection` that can be based on any of the possible Python frameworks that maintain the syntax used for handling the resulting future by the end user.

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/478 I agree with the `RemoteConnection` implementation being pluggable. Therefore, the GLV should be able to use any remote connection as long as it complies with the API. But, since pre-Python3 doesn't have any standard implementation of a `Future`, in order to maintain this "plugablity", IMO the GLV should be able to handle whatever type of `Future` the `RemoteConnection` returns. This places the onus on the end user to understand what `RemoteConnection` implementation they are using and the type of Future returned in order to comply with the syntax required by underlying framework implementation. This does not mean that responses need have a different API, only that developers know how to either `yield` or `yield from` them, or use the `concurrent.futures.as_completed` method. The good thing is that the three possible future types (asyncio, tornado, and concurrent) have compatible APIs, and should be able to all be handled by the GLV in the same fashion. However, if I understand correctly (not sure that I do) it seems to me that doing specialized handling, would require us to commit to a certain syntax. Unfortunately, all of these problems really stem from the Python community's unwillingness to drop the legacy Python implementation, which the Python Software Foundation has been urging us to do for years. If I am missing something, or there is a solution that I can't see for some reason, please provide an example of a way that allows any pluggable `RemoteConnection` that can be based on any of the possible Python frameworks that maintain the syntax used for handling the resulting future by the end user.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aboudreault commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          @davebshow @aholmberg (PYTHON) I must agree that my initial thought is that we should try to avoid the coupling with tornado. However, since tornado is already used internally, it might get better performance since it is optimzed and non thread-safe for the loop.

          If I understand correctly, a custom RemoteConnection will need to return its own future type. Isn't a bit confusing for the user that he will get different future types? I was seeing the RemoteConnection as the transport layer and that it was transparently pluggable. With this change, we see that it has an impact on the API of the query response. IMO, for that reason, we might want to consider a specialized handling of the future returned by the traversal query.

          Show
          githubbot ASF GitHub Bot added a comment - Github user aboudreault commented on the issue: https://github.com/apache/tinkerpop/pull/478 @davebshow @aholmberg (PYTHON) I must agree that my initial thought is that we should try to avoid the coupling with tornado. However, since tornado is already used internally, it might get better performance since it is optimzed and non thread-safe for the loop. If I understand correctly, a custom RemoteConnection will need to return its own future type. Isn't a bit confusing for the user that he will get different future types? I was seeing the RemoteConnection as the transport layer and that it was transparently pluggable. With this change, we see that it has an impact on the API of the query response. IMO, for that reason, we might want to consider a specialized handling of the future returned by the traversal query.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user newkek commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          I think the `promise()` method is more elegant as well, as it avoids adding many new methods in the Traversal API

          Show
          githubbot ASF GitHub Bot added a comment - Github user newkek commented on the issue: https://github.com/apache/tinkerpop/pull/478 I think the `promise()` method is more elegant as well, as it avoids adding many new methods in the Traversal API
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          I think I agree with @okram for now that we not add a lot of methods just yet. It's easy to add methods, but harder to take them away later. Let's be sure this gets used in the fashion we expect it to before we add helper methods that we will be stuck with.

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/478 I think I agree with @okram for now that we not add a lot of methods just yet. It's easy to add methods, but harder to take them away later. Let's be sure this gets used in the fashion we expect it to before we add helper methods that we will be stuck with.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jorgebay commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          I like the bottom-up approach and a really quick turnaround!

          Regarding the final API, given that it's an async flow designed for remote operations (request/response), it would be nice to define clear boundaries, making it easier to understand for the user what causes a remote execution. Instead an all-purpose `promise()` API, it would be nice to have a `toListAsync()` that returns a collection of `Vertex`/`Edge`/`Path`/...disconnected elements instead of a `RemoteTraversal`.
          Similar to `toListAsync()` we could expose an async method to get a single result, something like `firstAsync()` or `oneAsync()`, that returns the Vertices.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jorgebay commented on the issue: https://github.com/apache/tinkerpop/pull/478 I like the bottom-up approach and a really quick turnaround! Regarding the final API, given that it's an async flow designed for remote operations (request/response), it would be nice to define clear boundaries, making it easier to understand for the user what causes a remote execution. Instead an all-purpose `promise()` API, it would be nice to have a `toListAsync()` that returns a collection of `Vertex`/`Edge`/`Path`/...disconnected elements instead of a `RemoteTraversal`. Similar to `toListAsync()` we could expose an async method to get a single result, something like `firstAsync()` or `oneAsync()`, that returns the Vertices.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          I ran about 50 million messages through Gremlin Server over the weekend and didn't see any problems as a result of my changes here. I think this seems pretty solid now. I'd quietly asked both @newkek and @jorgebay to review last week my intermediate work and both seemed satisfied.

          I think we can focus this PR on how to implement this for gremin-python now. Seems like discussion on that has started, but the consensus on how to implement isn't quite clear.

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/478 I ran about 50 million messages through Gremlin Server over the weekend and didn't see any problems as a result of my changes here. I think this seems pretty solid now. I'd quietly asked both @newkek and @jorgebay to review last week my intermediate work and both seemed satisfied. I think we can focus this PR on how to implement this for gremin-python now. Seems like discussion on that has started, but the consensus on how to implement isn't quite clear.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          @aholmberg I'm not sure if I follow...it seems to me that if a `RemoteConnection` implementation returns a list (or future list ) of `FutureTraversers`, the traversal API can still remain independent from the transport. Even if the GLV needs to operate on the Traversers, `asyncio.Future`, `concurrent.Future`, and `tornado.concurrent.Future` have compatible APIs (`done`, `cancel`, `add_done_callback`, `result`, `set_result`, `etc.`). The "specialized" handling of the future, whether it be a coroutine that requires a `yield`, `yield from`, or `await` expression, or a `concurrent.Future`, will in the end be handled by application code written by the end user.

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/478 @aholmberg I'm not sure if I follow...it seems to me that if a `RemoteConnection` implementation returns a list (or future list ) of `FutureTraversers`, the traversal API can still remain independent from the transport. Even if the GLV needs to operate on the Traversers, `asyncio.Future`, `concurrent.Future`, and `tornado.concurrent.Future` have compatible APIs (`done`, `cancel`, `add_done_callback`, `result`, `set_result`, `etc.`). The "specialized" handling of the future, whether it be a coroutine that requires a `yield`, `yield from`, or `await` expression, or a `concurrent.Future`, will in the end be handled by application code written by the end user.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user al3xandru commented on a diff in the pull request:

          https://github.com/apache/tinkerpop/pull/478#discussion_r87521066

          — Diff: gremlin-core/pom.xml —
          @@ -61,6 +61,11 @@ limitations under the License.
          </exclusion>
          </exclusions>
          </dependency>
          + <dependency>
          + <groupId>org.apache.commons</groupId>
          — End diff –

          I cannot find any references to this new lib besides thread naming. is that its single purpose?

          Show
          githubbot ASF GitHub Bot added a comment - Github user al3xandru commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/478#discussion_r87521066 — Diff: gremlin-core/pom.xml — @@ -61,6 +61,11 @@ limitations under the License. </exclusion> </exclusions> </dependency> + <dependency> + <groupId>org.apache.commons</groupId> — End diff – I cannot find any references to this new lib besides thread naming. is that its single purpose?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aholmberg commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          > the type of future returned will depend on the underlying RemoteConnection implementation.

          That depends on whether you are okay leaking remote connection implementation through this API. I have been trying to think about this traversal API as something distinct from transport (in which case you would probably want an abstraction).

          Show
          githubbot ASF GitHub Bot added a comment - Github user aholmberg commented on the issue: https://github.com/apache/tinkerpop/pull/478 > the type of future returned will depend on the underlying RemoteConnection implementation. That depends on whether you are okay leaking remote connection implementation through this API. I have been trying to think about this traversal API as something distinct from transport (in which case you would probably want an abstraction).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          After actually thinking about this, the type of future returned will depend on the underlying `RemoteConnection` implementation. A call to the method`promise` will cause a `(async)traversal_strategy` to call `apply`, the result of this depends on the underlying remote connection, and all the GLV has to do is pass the returned futures along to the end user. Therefore a Tornado based driver would return a `tornado.concurrent.Future` etc. etc. Make sense? Or am I missing something...?

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/478 After actually thinking about this, the type of future returned will depend on the underlying `RemoteConnection` implementation. A call to the method`promise` will cause a `(async)traversal_strategy` to call `apply`, the result of this depends on the underlying remote connection, and all the GLV has to do is pass the returned futures along to the end user. Therefore a Tornado based driver would return a `tornado.concurrent.Future` etc. etc. Make sense? Or am I missing something...?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user aholmberg commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          Coupling the traversal API to a web framework does not seem appealing to me. Could we consider using the [futures backport](https://pypi.python.org/pypi/futures), which backports the Python 3 standard library features to Python 2?

          Show
          githubbot ASF GitHub Bot added a comment - Github user aholmberg commented on the issue: https://github.com/apache/tinkerpop/pull/478 Coupling the traversal API to a web framework does not seem appealing to me. Could we consider using the [futures backport] ( https://pypi.python.org/pypi/futures ), which backports the Python 3 standard library features to Python 2?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          thanks @davebshow - i might ask for your assistance on implementing that.

          still trying to get it right for java. that blocking call in `RemoteStep` stinks. having a bit of a hard time trying to get the future delegated all the way to the driver without tearing stuff up.

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/478 thanks @davebshow - i might ask for your assistance on implementing that. still trying to get it right for java. that blocking call in `RemoteStep` stinks. having a bit of a hard time trying to get the future delegated all the way to the driver without tearing stuff up.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          @spmallette you can easily implement something similar in Python using Futures. Using `tornado.concurrent.Future` would probably make the most sense, as is is already 2/3 compatible and should work with any Python async syntax (`yield`/`yield from`/`await`). The only problem with is that this ties the GLV code more tightly to Tornado, which was previously only used for IO; however, a similar implementation of Future isn't included with the Python 2 standard library, so Tornado is probably the best option. I actually did something pretty similar to this in the `gremlinclient` library.

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/478 @spmallette you can easily implement something similar in Python using Futures. Using `tornado.concurrent.Future` would probably make the most sense, as is is already 2/3 compatible and should work with any Python async syntax (`yield`/`yield from`/`await`). The only problem with is that this ties the GLV code more tightly to Tornado, which was previously only used for IO; however, a similar implementation of Future isn't included with the Python 2 standard library, so Tornado is probably the best option. I actually did something pretty similar to this in the `gremlinclient` library.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          > The thread pool is still needed at some point though, since executing against a local TinkerGraph is currently done in the same thread

          agreed - and that's where i was getting confused. I didn't have a quick IM chat with @jorgebay so i think we're on the same page now.

          > just thinking out loud and I probably miss a lot of details There may be a way to have a method submitAsync() on the RemoteConnection

          yeah - i was thinking something along the lines of what you suggested there. i'm stuck in a different problem with the general build of TinkerPop right now, but as soon as i fix that, i can return my attention to this. thanks for the input @newkek

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/478 > The thread pool is still needed at some point though, since executing against a local TinkerGraph is currently done in the same thread agreed - and that's where i was getting confused. I didn't have a quick IM chat with @jorgebay so i think we're on the same page now. > just thinking out loud and I probably miss a lot of details There may be a way to have a method submitAsync() on the RemoteConnection yeah - i was thinking something along the lines of what you suggested there. i'm stuck in a different problem with the general build of TinkerPop right now, but as soon as i fix that, i can return my attention to this. thanks for the input @newkek
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user newkek commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          The issue with using a secondary thread pool that will start a new thread for each Traversal execution would be more important when the Traversal is backed by a RemoteConnection. Where each Traversal represents a "query" that is sent to a server, if the driver behind the RemoteConnection is able to handle tens of thousands of requests, the performance will still be limited to the number of threads the async thread pool can handle simultaneously, and there would be a big waste of CPU/Threads.

          Ideally as @jorgebay suggests with the strategies more of the TinkerPop lib would be changed to become fully async, where maybe synchronous methods would only be blocking calls to the async execution method(s) (`next()` calls `promise().get()`).
          The thread pool is still needed at some point though, since executing against a local TinkerGraph is currently done in the same thread, as a sequential operation so Returning with a Future, and Continuing to process the operation in the background has to be done in 2 separate threads simultaneously.

          just thinking out loud and I probably miss a lot of details There may be a way to have a method `submitAsync()` on the `RemoteConnection` that returns a Future of Traverser (instead of Traverser) that, associated with what @jorgebay suggests for the Strategies and some other changes would allow the `promise()` call, when associated with a `RemoteConnection` not to create a new thread each time by default and simply return the Future returned with `RemoteConnection#submitAsync()`, and thus delegate the asynchronous execution to the RemoteConnection

          Show
          githubbot ASF GitHub Bot added a comment - Github user newkek commented on the issue: https://github.com/apache/tinkerpop/pull/478 The issue with using a secondary thread pool that will start a new thread for each Traversal execution would be more important when the Traversal is backed by a RemoteConnection. Where each Traversal represents a "query" that is sent to a server, if the driver behind the RemoteConnection is able to handle tens of thousands of requests, the performance will still be limited to the number of threads the async thread pool can handle simultaneously, and there would be a big waste of CPU/Threads. Ideally as @jorgebay suggests with the strategies more of the TinkerPop lib would be changed to become fully async, where maybe synchronous methods would only be blocking calls to the async execution method(s) (`next()` calls `promise().get()`). The thread pool is still needed at some point though, since executing against a local TinkerGraph is currently done in the same thread, as a sequential operation so Returning with a Future, and Continuing to process the operation in the background has to be done in 2 separate threads simultaneously. just thinking out loud and I probably miss a lot of details There may be a way to have a method `submitAsync()` on the `RemoteConnection` that returns a Future of Traverser (instead of Traverser) that, associated with what @jorgebay suggests for the Strategies and some other changes would allow the `promise()` call, when associated with a `RemoteConnection` not to create a new thread each time by default and simply return the Future returned with `RemoteConnection#submitAsync()`, and thus delegate the asynchronous execution to the RemoteConnection
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jorgebay commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          > What makes `toListAsync()` more "fully async" compared to `promise(traversal::toList)`? Internally, from a Java perspective anyway, `toListAsync()` does the same thing, doesn't it?

          By fully async I meant asynchronous execution that doesn't block in any call and doesn't require a threadpool. All the way down, it would be based on futures / async operations. In java, it would mean no `CompletableFuture::get()` calls.
          That translates into supporting higher levels of concurrency as there is no threadpool limiting the amount of calls in parallel.

          For TinkerPop, it would require asynchronous strategies as currently the blocking calls are made in the `apply()` method. I've suggested an `applyAsync()` method for an async strategy interface that returns a `CompletableFuture`.

          For other technologies, the same logic applies as we shouldn't care about the underlying framework / network library (Python tornado / libuv / ...) or thread pools.
          For C#, it would be `Task` all the way down; for Python it could be async generators or futures all the way down; for Javascript, Promise or async callbacks; ...

          Show
          githubbot ASF GitHub Bot added a comment - Github user jorgebay commented on the issue: https://github.com/apache/tinkerpop/pull/478 > What makes `toListAsync()` more "fully async" compared to `promise(traversal::toList)`? Internally, from a Java perspective anyway, `toListAsync()` does the same thing, doesn't it? By fully async I meant asynchronous execution that doesn't block in any call and doesn't require a threadpool. All the way down, it would be based on futures / async operations. In java, it would mean no `CompletableFuture::get()` calls. That translates into supporting higher levels of concurrency as there is no threadpool limiting the amount of calls in parallel. For TinkerPop, it would require asynchronous strategies as currently the blocking calls are made in the `apply()` method. I've suggested an `applyAsync()` method for an async strategy interface that returns a `CompletableFuture`. For other technologies, the same logic applies as we shouldn't care about the underlying framework / network library (Python tornado / libuv / ...) or thread pools. For C#, it would be `Task` all the way down; for Python it could be async generators or futures all the way down; for Javascript, Promise or async callbacks; ...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          > CoreTraversalTest is that it uses mutating steps and thus, not subject to OLAP testing

          Ah - didn't think to check if that was fully ignored. I'll fix that up.

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/478 > CoreTraversalTest is that it uses mutating steps and thus, not subject to OLAP testing Ah - didn't think to check if that was fully ignored. I'll fix that up.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          The problem with the `CoreTraversalTest` is that it uses mutating steps and thus, not subject to OLAP testing. I would simply create a `PromiseTest` just like `ExplainTest` and have a few traversals in there so we know both OLTP and OLAP return correctly.

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/478 The problem with the `CoreTraversalTest` is that it uses mutating steps and thus, not subject to OLAP testing. I would simply create a `PromiseTest` just like `ExplainTest` and have a few traversals in there so we know both OLTP and OLAP return correctly.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          > I think that, instead of doing blocking calls like fn.apply() on a thread pool, we could expose a fully async API. Something like toListAsync() that returns a CompletableFuture<List<T>>.

          Could you please clarify a bit - What makes `toListAsync()` more "fully async" compared to `promise(traversal::toList)`? Internally, from a Java perspective anyway, `toListAsync()` does the same thing, doesn't it?

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/478 > I think that, instead of doing blocking calls like fn.apply() on a thread pool, we could expose a fully async API. Something like toListAsync() that returns a CompletableFuture<List<T>>. Could you please clarify a bit - What makes `toListAsync()` more "fully async" compared to `promise(traversal::toList)`? Internally, from a Java perspective anyway, `toListAsync()` does the same thing, doesn't it?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          > Why do you ignore promise tests for GroovyTranslator and JavaTranslator. Seems we would want that to work as well.

          I thought I was getting errors because of the lambda i use to simulate a slow script. Should that work?

          > Why not have a PromiseTest in the process computer and process standard suites? It would be good to know that computer tests (which wrap a CompletableFuture too!) work as expected.

          There is a test in the process standard suite:

          https://github.com/apache/tinkerpop/pull/478/files#diff-37e31635f13d54b33745544db66cc590R282

          `CoreTraversalTest` executes as part of the `ProcessStandardSuite`. Is that sufficient? Should I add something extra for computer tests too?

          > We should probably get Gremlin-Python set up on this branch before merging into tp32/. I can write the code, I just don't know much about promises nor how to do them in Python.

          I wouldn't mind writing the code. As I mentioned in the comment, I just knew there would be discussion required to get it done. My limited reading on the subject made it seem like there are difference between how it would be implemented in python 3.x and 2.x as well. note sure if @davebshow has a minute to shed any light on this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/478 > Why do you ignore promise tests for GroovyTranslator and JavaTranslator. Seems we would want that to work as well. I thought I was getting errors because of the lambda i use to simulate a slow script. Should that work? > Why not have a PromiseTest in the process computer and process standard suites? It would be good to know that computer tests (which wrap a CompletableFuture too!) work as expected. There is a test in the process standard suite: https://github.com/apache/tinkerpop/pull/478/files#diff-37e31635f13d54b33745544db66cc590R282 `CoreTraversalTest` executes as part of the `ProcessStandardSuite`. Is that sufficient? Should I add something extra for computer tests too? > We should probably get Gremlin-Python set up on this branch before merging into tp32/. I can write the code, I just don't know much about promises nor how to do them in Python. I wouldn't mind writing the code. As I mentioned in the comment, I just knew there would be discussion required to get it done. My limited reading on the subject made it seem like there are difference between how it would be implemented in python 3.x and 2.x as well. note sure if @davebshow has a minute to shed any light on this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jorgebay commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          I think that, instead of doing blocking calls like `fn.apply()` on a thread pool, we could expose a fully async API. Something like `toListAsync()` that returns a `CompletableFuture<List<T>>`.

          For that, we should expose an async `TraversalStrategy` interface containing a `applyAsync()` method that returns a completable future.

          A fully async API could be translate to any language, even for the ones that don't feature thread pools (like JavaScript).

          Show
          githubbot ASF GitHub Bot added a comment - Github user jorgebay commented on the issue: https://github.com/apache/tinkerpop/pull/478 I think that, instead of doing blocking calls like `fn.apply()` on a thread pool, we could expose a fully async API. Something like `toListAsync()` that returns a `CompletableFuture<List<T>>`. For that, we should expose an async `TraversalStrategy` interface containing a `applyAsync()` method that returns a completable future. A fully async API could be translate to any language, even for the ones that don't feature thread pools (like JavaScript).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the issue:

          https://github.com/apache/tinkerpop/pull/478

          Wow. This is cool. Questions/comments:

          1. Why do you ignore promise tests for `GroovyTranslator` and `JavaTranslator`. Seems we would want that to work as well.
          2. Why not have a `PromiseTest` in the process computer and process standard suites? It would be good to know that computer tests (which wrap a `CompletableFuture` too!) work as expected.
          3. We should probably get Gremlin-Python set up on this branch before merging into `tp32/`. I can write the code, I just don't know much about promises nor how to do them in Python.
          4. We should get @jorgebay to review to make sure Gremlin-JavaScript will be able to use this model/naming convention so we don't diverge Gremlin-JS too much from Gremlin-Java.

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/478 Wow. This is cool. Questions/comments: 1. Why do you ignore promise tests for `GroovyTranslator` and `JavaTranslator`. Seems we would want that to work as well. 2. Why not have a `PromiseTest` in the process computer and process standard suites? It would be good to know that computer tests (which wrap a `CompletableFuture` too!) work as expected. 3. We should probably get Gremlin-Python set up on this branch before merging into `tp32/`. I can write the code, I just don't know much about promises nor how to do them in Python. 4. We should get @jorgebay to review to make sure Gremlin-JavaScript will be able to use this model/naming convention so we don't diverge Gremlin-JS too much from Gremlin-Java.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user spmallette opened a pull request:

          https://github.com/apache/tinkerpop/pull/478

          TINKERPOP-1490 Implemented promise API for Traversal

          https://issues.apache.org/jira/browse/TINKERPOP-1490

          Added two promise() methods that return `CompletableFuture` on `Traversal`. Provided an override on `DefaultTraversal` for those methods because the function that transforms the Traversal is executed in a different thread and therefore requires Graph transaction management (or else we would orphan transactions). Did not update gremlin-python with the promise API because it seemed to beg discussion on the "right" way to do that (i.e. what library to use to support promises?, just use futures from the core lib?, etc).

          Had to move `commons-lang` to `gremlin-core` as a dependency (it was in `gremlin-groovy`) so I that I could make use of the thread naming factory. Doesn't really change anything as `gremlin-groovy` packaged with the console and server anyway. LICENSE/NOTICE can remain unchanged as a result.

          Works with `mvn clean install -DincludeNeo4j`.

          VOTE +1

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/tinkerpop TINKERPOP-1490

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tinkerpop/pull/478.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #478


          commit 9ad3042ff876a88af719a95b8e144f56b10c81c4
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2016-11-01T13:30:28Z

          TINKERPOP-1490 Implemented promise API for Traversal

          Added two promise() methods that return CompletableFuture on Traversal. Provided an override on DefaultTraversal for those methods because the function that transforms the Traversal is executed in a different thread and therefore requires Graph transaction management (or else we would orphan transactions). Did not update gremlin-python with the promise API because it seemed to beg discussion on the "right" way to do that (i.e. what library to use to support promises?, just use futures from the core lib?, etc).


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user spmallette opened a pull request: https://github.com/apache/tinkerpop/pull/478 TINKERPOP-1490 Implemented promise API for Traversal https://issues.apache.org/jira/browse/TINKERPOP-1490 Added two promise() methods that return `CompletableFuture` on `Traversal`. Provided an override on `DefaultTraversal` for those methods because the function that transforms the Traversal is executed in a different thread and therefore requires Graph transaction management (or else we would orphan transactions). Did not update gremlin-python with the promise API because it seemed to beg discussion on the "right" way to do that (i.e. what library to use to support promises?, just use futures from the core lib?, etc). Had to move `commons-lang` to `gremlin-core` as a dependency (it was in `gremlin-groovy`) so I that I could make use of the thread naming factory. Doesn't really change anything as `gremlin-groovy` packaged with the console and server anyway. LICENSE/NOTICE can remain unchanged as a result. Works with `mvn clean install -DincludeNeo4j`. VOTE +1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1490 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/478.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #478 commit 9ad3042ff876a88af719a95b8e144f56b10c81c4 Author: Stephen Mallette <spmva@genoprime.com> Date: 2016-11-01T13:30:28Z TINKERPOP-1490 Implemented promise API for Traversal Added two promise() methods that return CompletableFuture on Traversal. Provided an override on DefaultTraversal for those methods because the function that transforms the Traversal is executed in a different thread and therefore requires Graph transaction management (or else we would orphan transactions). Did not update gremlin-python with the promise API because it seemed to beg discussion on the "right" way to do that (i.e. what library to use to support promises?, just use futures from the core lib?, etc).
          Hide
          spmallette stephen mallette added a comment - - edited

          future with an extension of FutureTask seems good to me, though i don't think we should ignore CompletableFuture. Traversal should have:

          void promise(CompletableFuture future)  // perhaps we just start a thread to iterate the traversal (i.e no thread pool)
          void promise(CompletableFuture future, ExecutorService executor)  // in this case we provide a thread pool to execute in
          

          CompletableFuture provides a lot more flexibility for handling exceptions, starting async tasks in different executors, and clarifies situations with lots of callback chaining. It makes it trivial to do stuff like:

          promiseOutNames = new CompletableFuture()
          promiseInNames = new CompletableFuture().thenAccept(f -> transformResult(f))  
          promiseBothNames = promiseInNames.thenCombine(promiseOutNames, (in,out) -> combineInOut(in, out)).
                                                                         thenRunAsync(f -> notifySomethingOfBoth(f), gremlinServerExecutorService)
          g.V(1).out().values("name").promise(promise1)
          g.V(1).in().values("name").promise(promise2)
          

          So the above would traverse in/out vertices in parallel across two separate threads, combine the results and then make an async call to notify some service about the combined results.

          Show
          spmallette stephen mallette added a comment - - edited future with an extension of FutureTask seems good to me, though i don't think we should ignore CompletableFuture . Traversal should have: void promise(CompletableFuture future ) // perhaps we just start a thread to iterate the traversal (i.e no thread pool) void promise(CompletableFuture future , ExecutorService executor) // in this case we provide a thread pool to execute in CompletableFuture provides a lot more flexibility for handling exceptions, starting async tasks in different executors, and clarifies situations with lots of callback chaining. It makes it trivial to do stuff like: promiseOutNames = new CompletableFuture() promiseInNames = new CompletableFuture().thenAccept(f -> transformResult(f)) promiseBothNames = promiseInNames.thenCombine(promiseOutNames, (in,out) -> combineInOut(in, out)). thenRunAsync(f -> notifySomethingOfBoth(f), gremlinServerExecutorService) g.V(1).out().values( "name" ).promise(promise1) g.V(1).in().values( "name" ).promise(promise2) So the above would traverse in/out vertices in parallel across two separate threads, combine the results and then make an async call to notify some service about the combined results.
          Hide
          okram Marko A. Rodriguez added a comment -

          There are numerous "action/terminal"-methods off of Traversal.

          • next()
          • hasNext()
          • nextTraverser()
          • toList()
          • toSet()
          • iterate()
          • explain()
          • ...

          If we want asyncrhonous support for all terminal methods, then I think we shouldn't go down the route of nextAsync(), toListAsync(), etc. That is a bit dirty. Instead, I think we should have one method — call it what we want — that is basically a "future" that takes a Function<Traversal,V>. Thus, we can do:

          g.V().out().future(Traversal::toList)
          g.V().out().future(Traversal::next)
          ...
          

          In Java, its called a Future. Moreover, you can do CompletableFutures (https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html). Thus, I think we kill many birds with one stone here. We get "async" and "callback" support using one (and only one) method. Lets call it Traversal.future() given the status of the term "async." Thus, I propose that we add:

          Future<T> Traversal.future(Function<Traversal,T>)
          

          Lets get more specific and allow both "async" and "callback" via https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/FutureTask.html.

          OurFuture<T> Traversal.future(Function<Traversal,T>)
          void OurFuture.then(Consume<T>)
          

          ...where OurFuture will be a custom class based off of FutureTask. In sum total, you would then be able to do two things:

          OurFuture<List<String>> result = g.V().out().values("name").future(Traversal::List)
          // some other stuff in parallel
          List<String> list = result.get()
          

          ...and

          g.V().out().values("name").future(Traversal::List).then(names -> System.out.println("Here are the names of everyone: " + names))
          // some other stuff in parallel
          

          Both async and callback support across all terminal Traversal methods.

              • NOTE: I'm not married to future and then method names...so whateves.
          Show
          okram Marko A. Rodriguez added a comment - There are numerous "action/terminal"-methods off of Traversal . next() hasNext() nextTraverser() toList() toSet() iterate() explain() ... If we want asyncrhonous support for all terminal methods, then I think we shouldn't go down the route of nextAsync() , toListAsync() , etc. That is a bit dirty. Instead, I think we should have one method — call it what we want — that is basically a "future" that takes a Function<Traversal,V> . Thus, we can do: g.V().out(). future (Traversal::toList) g.V().out(). future (Traversal::next) ... In Java, its called a Future . Moreover, you can do CompletableFutures ( https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html ). Thus, I think we kill many birds with one stone here. We get "async" and "callback" support using one (and only one) method. Lets call it Traversal.future() given the status of the term "async." Thus, I propose that we add: Future<T> Traversal. future (Function<Traversal,T>) Lets get more specific and allow both "async" and "callback" via https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/FutureTask.html . OurFuture<T> Traversal. future (Function<Traversal,T>) void OurFuture.then(Consume<T>) ...where OurFuture will be a custom class based off of FutureTask . In sum total, you would then be able to do two things: OurFuture<List< String >> result = g.V().out().values( "name" ). future (Traversal::List) // some other stuff in parallel List< String > list = result.get() ...and g.V().out().values( "name" ). future (Traversal::List).then(names -> System .out.println( "Here are the names of everyone: " + names)) // some other stuff in parallel Both async and callback support across all terminal Traversal methods. NOTE: I'm not married to future and then method names...so whateves.
          Hide
          jorgebg Jorge Bay added a comment -

          async is a reserved keyword in many languages:

          Maybe something like toListAsync(), as async counterpart of toList(), and getAsync(), as a counterpart of next() (instead of nextAsync() to avoid to give the impression that is an "async iterator") ?

          Show
          jorgebg Jorge Bay added a comment - async is a reserved keyword in many languages: In C# 5+ and above. In Ecmascript , already implemented in some of the Javascript runtimes like Microsoft Chakra . It looks like it's also going to be a reserved keyword in Python 3.7 . Maybe something like toListAsync() , as async counterpart of toList() , and getAsync() , as a counterpart of next() (instead of nextAsync() to avoid to give the impression that is an "async iterator") ?

            People

            • Assignee:
              spmallette stephen mallette
              Reporter:
              okram Marko A. Rodriguez
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development