Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.2.3
    • Fix Version/s: 3.2.5
    • Component/s: python
    • Labels:
      None

      Description

      It is high time that gremlin-python comes packaged with a real driver. After watching the community discussion, it seems that the way to go will be to use the concurrent.futures module with multithreading to provide asynchronous I/O. While the default underlying websocket client library will remain Tornado due to Python 2/3 compatibility issues, this should be decoupled from the rest of the client and easy to replace.

      With this is mind, I created a baseline client implementation with this commit in a topic branch python_driver. Some things to note:

      • All I/O is performed using the concurrent.futures module, which provides a standard 2/3 compatible future interface.
      • The implementation currently does not include the concept of a cluster, instead it assumes a single host.
      • The transport interface makes it easy to plug in client libraries by defining a simple wrapper.
      • Because this is an example, I didn't fix all the tests to work with the new client implementation. Instead I just added a few demo tests. If we decide to move forward with this I will update the original tests.

      The resulting API looks like this for a simple client:

      client = Client('ws://localhost:8182/gremlin', 'g')
      g = Graph().traversal()
      t = g.V()
      future_result_set = client.submitAsync(t.bytecode)
      result_set = future_result_set.result()
      results = result_set.all().result()
      client.close()
      

      Using the DriverRemoteConnection:

      conn = DriverRemoteConnection('ws://localhost:8182/gremlin', 'g')
      g = Graph().traversal().withRemote(conn)
      t = g.V()
      results = t.toList()
      conn.close()
      

      If you have a minute to check out the new driver code that would be great, I welcome feedback and suggestions. If we decide to move forward like this, I will proceed to finish the driver implementation.

        Issue Links

          Activity

          Hide
          davebshow David M. Brown added a comment -

          I've made a few small fixes since I first pushed this code. It would probably be better to see the changes in comparison mode.

          Show
          davebshow David M. Brown added a comment - I've made a few small fixes since I first pushed this code. It would probably be better to see the changes in comparison mode .
          Hide
          spmallette stephen mallette added a comment -

          I gave it a quick look. I see that you moved a couple of public classes around. I assume that is a breaking change. If you feel like some reorganization was needed here, I think you should "deprecate" the old classes and keep them where they are (i'm assuming there is a way to mark code as deprecated in python) then introduce the new ones elsewhere. I think it's good that we have an actual "driver" now for python. It will be good to start to see the python driver begin to get similar features to the java one. Thanks for doing that.

          Show
          spmallette stephen mallette added a comment - I gave it a quick look. I see that you moved a couple of public classes around. I assume that is a breaking change. If you feel like some reorganization was needed here, I think you should "deprecate" the old classes and keep them where they are (i'm assuming there is a way to mark code as deprecated in python) then introduce the new ones elsewhere. I think it's good that we have an actual "driver" now for python. It will be good to start to see the python driver begin to get similar features to the java one. Thanks for doing that.
          Hide
          davebshow David M. Brown added a comment -

          Good point. I moved the DriverRemoteConnection class to a submodule to mimic the file structure of the Java driver, which will break import statements. I could either provide a deprecation warning and leave things as they are, or move the class back to the original spot. Either way would be fine.

          Assuming that there are no objections, I will move the driver code to a TINKERPOP-1599 branch and work on a PR from there.

          Regarding providing "similar" features to the Java driver, what are the priorities here? Things I can think of off the top of my head are:

          • Implementing Cluster and Host to allow for multiple hosts.
          • Allowing configuration from a config file.
          • More advanced pooling/connection logic, like allowing a connection to be borrowed multiple times with a max_in_process setting.
          • Heartbeating.
          Show
          davebshow David M. Brown added a comment - Good point. I moved the DriverRemoteConnection class to a submodule to mimic the file structure of the Java driver, which will break import statements. I could either provide a deprecation warning and leave things as they are, or move the class back to the original spot. Either way would be fine. Assuming that there are no objections, I will move the driver code to a TINKERPOP-1599 branch and work on a PR from there. Regarding providing "similar" features to the Java driver, what are the priorities here? Things I can think of off the top of my head are: Implementing Cluster and Host to allow for multiple hosts. Allowing configuration from a config file. More advanced pooling/connection logic, like allowing a connection to be borrowed multiple times with a max_in_process setting. Heartbeating.
          Hide
          spmallette stephen mallette added a comment -

          All of those are good ideas. From my perspective, I think you generally listed them in order of what I'd like to see first.

          Show
          spmallette stephen mallette added a comment - All of those are good ideas. From my perspective, I think you generally listed them in order of what I'd like to see first.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user davebshow opened a pull request:

          https://github.com/apache/tinkerpop/pull/554

          TINKERPOP-1599 implement real gremlin-python driver

          https://issues.apache.org/jira/browse/TINKERPOP-1599

          This PR adds a better driver implementation for gremlin-python:

          • Uses a multi-threaded solution to provide asynchronous I/O
          • Decouples the underlying websocket client implementation from driver code, making it easy to plug in a different client
          • Makes it easy to plug in different protocols, like the Gremlin Server HTTP protocol.
          • Adds simple connection pooling used in concurrent requests, which increases driver performance with slow traversals
          • Improves driver tests by adding `pytest` fixtures and removing unneeded `unittest` code.

          This driver still isn't full featured compared to the Java driver, for example, it doesn't implement `Cluster` to use multiple hosts. But, it is considerably better than the old implementation, and as this was becoming a big PR, I figure we can add more features with subsequent PRs.

          Also note, I know we are in code freeze right now, so I don't expect this to be reviewed/merged for 3.2.4, but I'm busy and I wanted to get this done so I can move on.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/tinkerpop TINKERPOP-1599

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tinkerpop/pull/554.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #554


          commit b10d6048ced6de08c307b8b9382d67e97e2cb0ef
          Author: davebshow <davebshow@gmail.com>
          Date: 2017-01-30T22:22:36Z

          added code for new driver, updated driver tests


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user davebshow opened a pull request: https://github.com/apache/tinkerpop/pull/554 TINKERPOP-1599 implement real gremlin-python driver https://issues.apache.org/jira/browse/TINKERPOP-1599 This PR adds a better driver implementation for gremlin-python: Uses a multi-threaded solution to provide asynchronous I/O Decouples the underlying websocket client implementation from driver code, making it easy to plug in a different client Makes it easy to plug in different protocols, like the Gremlin Server HTTP protocol. Adds simple connection pooling used in concurrent requests, which increases driver performance with slow traversals Improves driver tests by adding `pytest` fixtures and removing unneeded `unittest` code. This driver still isn't full featured compared to the Java driver, for example, it doesn't implement `Cluster` to use multiple hosts. But, it is considerably better than the old implementation, and as this was becoming a big PR, I figure we can add more features with subsequent PRs. Also note, I know we are in code freeze right now, so I don't expect this to be reviewed/merged for 3.2.4, but I'm busy and I wanted to get this done so I can move on. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1599 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/554.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #554 commit b10d6048ced6de08c307b8b9382d67e97e2cb0ef Author: davebshow <davebshow@gmail.com> Date: 2017-01-30T22:22:36Z added code for new driver, updated driver tests
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/554

          @davebshow now that code freeze is lifted i think you should probably rebase this PR. Also, two documentation related items:

          1. This works needs a changelog entry - it was probably better you didn't add that yet since you would have had to move it anyway to 3.2.5.
          2. This change seems sufficiently large and important that it should have some documentation on usage in the reference docs and should likely have an entry in the upgrade docs as well so that users know that there as an actual driver that can be used for python now.

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/554 @davebshow now that code freeze is lifted i think you should probably rebase this PR. Also, two documentation related items: 1. This works needs a changelog entry - it was probably better you didn't add that yet since you would have had to move it anyway to 3.2.5. 2. This change seems sufficiently large and important that it should have some documentation on usage in the reference docs and should likely have an entry in the upgrade docs as well so that users know that there as an actual driver that can be used for python now.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/554

          Yeah I figured we would need to add docs. Where should I add to the reference docs? Maybe in [Gremlin Applications](https://github.com/apache/tinkerpop/blob/master/docs/src/reference/gremlin-applications.asciidoc) after the [Connecting via Java](https://github.com/apache/tinkerpop/blob/master/docs/src/reference/gremlin-applications.asciidoc#connecting-via-java) section?

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/554 Yeah I figured we would need to add docs. Where should I add to the reference docs? Maybe in [Gremlin Applications] ( https://github.com/apache/tinkerpop/blob/master/docs/src/reference/gremlin-applications.asciidoc ) after the [Connecting via Java] ( https://github.com/apache/tinkerpop/blob/master/docs/src/reference/gremlin-applications.asciidoc#connecting-via-java ) section?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/554

          I think that make sense - just add a "Connecting via Python" section perhaps.

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/554 I think that make sense - just add a "Connecting via Python" section perhaps.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/554

          Ok, I have now rebased ad added docs to both reference and upgrading. I also reviewed the old implementation and made a couple small fixes for consistency. I also added standard op processor message serialization. I think this one is getting close to being ready to go.

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/554 Ok, I have now rebased ad added docs to both reference and upgrading. I also reviewed the old implementation and made a couple small fixes for consistency. I also added standard op processor message serialization. I think this one is getting close to being ready to go.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on a diff in the pull request:

          https://github.com/apache/tinkerpop/pull/554#discussion_r102272018

          — Diff: CHANGELOG.asciidoc —
          @@ -29,6 +29,11 @@ TinkerPop 3.2.5 (Release Date: NOT OFFICIALLY RELEASED YET)

          • Refactor `SparkContext` handler to support external kill and stop operations.
          • Fixed an optimization bug in `LazyBarrierStrategy` around appending barriers to the end of a `Traversal`.
          • `TraverserIterator` in GremlinServer is smart to try and bulk traversers prior to network I/O.
            +* Improved Gremlin-Python Driver implementation by adding a threaded client with basic connection pooling and support for pluggable websocket clients.
            +
            +Improvements
            +^^^^^^^^^^^^
            +TINKERPOP-1599 implement real gremlin-python driver
              • End diff –

          you don't need to add those manually - they get added on release (we generate a report out of jira for it).

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/554#discussion_r102272018 — Diff: CHANGELOG.asciidoc — @@ -29,6 +29,11 @@ TinkerPop 3.2.5 (Release Date: NOT OFFICIALLY RELEASED YET) Refactor `SparkContext` handler to support external kill and stop operations. Fixed an optimization bug in `LazyBarrierStrategy` around appending barriers to the end of a `Traversal`. `TraverserIterator` in GremlinServer is smart to try and bulk traversers prior to network I/O. +* Improved Gremlin-Python Driver implementation by adding a threaded client with basic connection pooling and support for pluggable websocket clients. + +Improvements +^^^^^^^^^^^^ + TINKERPOP-1599 implement real gremlin-python driver End diff – you don't need to add those manually - they get added on release (we generate a report out of jira for it).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/554

          Well, I've made my last review of this code and pushed a bit of cleanup and a small fix. IMHO, this is ready to be merged. I'll wait to see if there are any more comments or feedback before upvoting.

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/554 Well, I've made my last review of this code and pushed a bit of cleanup and a small fix. IMHO, this is ready to be merged. I'll wait to see if there are any more comments or feedback before upvoting.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/554

          All tests pass with `docker/build.sh -t -n -i`

          VOTE +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/554 All tests pass with `docker/build.sh -t -n -i` VOTE +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/554

          I think this is ready. Anyone want to be a third reviewer for this?

          VOTE +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/554 I think this is ready. Anyone want to be a third reviewer for this? VOTE +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the issue:

          https://github.com/apache/tinkerpop/pull/554

          VOTE +1.

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/554 VOTE +1.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow commented on the issue:

          https://github.com/apache/tinkerpop/pull/554

          Merged.

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow commented on the issue: https://github.com/apache/tinkerpop/pull/554 Merged.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user davebshow closed the pull request at:

          https://github.com/apache/tinkerpop/pull/554

          Show
          githubbot ASF GitHub Bot added a comment - Github user davebshow closed the pull request at: https://github.com/apache/tinkerpop/pull/554

            People

            • Assignee:
              davebshow David M. Brown
              Reporter:
              davebshow David M. Brown
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development