XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.6
Fix Version/s: 3.5.0, 3.4.9
Component/s: python
Labels:
None
Environment:
Ubuntu 18.04, Flask 1.1.1, python 3.8.1, Amazon Neptune, Gremlin Server

Description

On a HTTP server that connects to Amazon Neptune, I've seen some situations where a request just hangs and never returns any response. While investigating this, I found out that it hangs right when it is going to query Neptune.

The problem is that if the connection to Gremlin/Neptune is established and after that the server does not respond any more, the gremlin connection never times out, making the process/thread wait forever for a response that will never come.

How to reproduce

Start a local gremlin server on the default port 8182
On a terminal, run nc to listen on port 8183 with nc -lk 8183

Run the following python code to connect to the 8183 port:

from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", "g")                                               
g = traversal().withRemote(remote_connection)                                                                                
g.V().limit(1).toList()

You will see the connection request on nc output. First time, don't do anything and it will timeout saying the connection couldn't be established.
Now repeat the steps, but make nc respond to establish the connection. The quickest way I found is to manually relay the message the real gremlin server:
1. Copy the whole request from nc -l output
2. On another terminal, open a connection to the gremlin server with nc 127.0.0.1 8182
3. Paste the request you copied before to nc 127.0.0.1 8182 terminal
4. Copy the gremlin server response and paste into nc -l output
5. The connection will be established and the nc -l will receive some unprintable chars corresponding to g.V().limit(1).toList()
6. Now, if there is no response from nc -l process, the python code will hang forever.

Possible solution

As I looked into it, the problem seems that the TornadoTransport implementation does not pass any timeout when reading (and writing) messages. So, passing a timeout to self._loop.run_sync can solve the issue, at least raising an exception when the server does not respond.

If I change the example above:

from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.driver.tornado.transport import TornadoTransport                                                         
from gremlin_python.process.anonymous_traversal import traversal

class CustomTornadoTransport(TornadoTransport): 
    def read(self): 
        return self._loop.run_sync(lambda: self._ws.read_message(), timeout=5)

remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", "g", transport_factory=CustomTornadoTransport)
g = traversal().withRemote(remote_connection)                                                                                
g.V().limit(1).toList()

and repeat the same steps, g.V().limit(1).toList() times out after not getting any response from the server for 5 seconds.

I'm not sure if there should be any timeout for writing, but it seems it should definitely be set for read operations.

Attachments

Activity

People

Assignee:: Stephen Mallette

Reporter:: Guilherme Quentel Melo

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Aug/20 16:05

Updated:: 02/Sep/20 15:01

Resolved:: 02/Sep/20 15:01