Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-2405

gremlinpython: traversal hangs when the connection is established but the servers stops responding later



    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.4.6
    • Fix Version/s: 3.5.0, 3.4.9
    • Component/s: python
    • Labels:
    • Environment:
       Ubuntu 18.04, Flask 1.1.1, python 3.8.1, Amazon Neptune, Gremlin Server


      On a HTTP server that connects to Amazon Neptune, I've seen some situations where a request just hangs and never returns any response. While investigating this, I found out that it hangs right when it is going to query Neptune.

      The problem is that if the connection to Gremlin/Neptune is established and after that the server does not respond any more, the gremlin connection never times out, making the process/thread wait forever for a response that will never come.

      How to reproduce

      1. Start a local gremlin server on the default port 8182
      2. On a terminal, run nc to listen on port 8183 with nc -lk 8183
      3. Run the following python code to connect to the 8183 port:
        from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
        from gremlin_python.process.anonymous_traversal import traversal
        remote_connection = DriverRemoteConnection("ws://", "g")                                               
        g = traversal().withRemote(remote_connection)                                                                                
      4. You will see the connection request on nc output. First time, don't do anything and it will timeout saying the connection couldn't be established.
      5. Now repeat the steps, but make nc respond to establish the connection. The quickest way I found is to manually relay the message the real gremlin server:
        1. Copy the whole request from nc -l output
        2. On another terminal, open a connection to the gremlin server with nc 8182
        3. Paste the request you copied before to nc 8182 terminal
        4. Copy the gremlin server response and paste into nc -l output
        5. The connection will be established and the nc -l will receive some unprintable chars corresponding to g.V().limit(1).toList()
        6. Now, if there is no response from nc -l process, the python code will hang forever.

      Possible solution

      As I looked into it, the problem seems that the TornadoTransport implementation does not pass any timeout when reading (and writing) messages. So, passing a timeout to self._loop.run_sync can solve the issue, at least raising an exception when the server does not respond.

      If I change the example above:

      from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
      from gremlin_python.driver.tornado.transport import TornadoTransport                                                         
      from gremlin_python.process.anonymous_traversal import traversal
      class CustomTornadoTransport(TornadoTransport): 
          def read(self): 
              return self._loop.run_sync(lambda: self._ws.read_message(), timeout=5)
      remote_connection = DriverRemoteConnection("ws://", "g", transport_factory=CustomTornadoTransport)
      g = traversal().withRemote(remote_connection)                                                                                

      and repeat the same steps, g.V().limit(1).toList() times out after not getting any response from the server for 5 seconds.

      I'm not sure if there should be any timeout for writing, but it seems it should definitely be set for read operations.




            • Assignee:
              spmallette Stephen Mallette
              gqmelo Guilherme Quentel Melo
            • Votes:
              0 Vote for this issue
              3 Start watching this issue


              • Created: