Details
Description
On a HTTP server that connects to Amazon Neptune, I've seen some situations where a request just hangs and never returns any response. While investigating this, I found out that it hangs right when it is going to query Neptune.
The problem is that if the connection to Gremlin/Neptune is established and after that the server does not respond any more, the gremlin connection never times out, making the process/thread wait forever for a response that will never come.
How to reproduce
- Start a local gremlin server on the default port 8182
- On a terminal, run nc to listen on port 8183 with nc -lk 8183
- Run the following python code to connect to the 8183 port:
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection from gremlin_python.process.anonymous_traversal import traversal remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", "g") g = traversal().withRemote(remote_connection) g.V().limit(1).toList()
- You will see the connection request on nc output. First time, don't do anything and it will timeout saying the connection couldn't be established.
- Now repeat the steps, but make nc respond to establish the connection. The quickest way I found is to manually relay the message the real gremlin server:
- Copy the whole request from nc -l output
- On another terminal, open a connection to the gremlin server with nc 127.0.0.1 8182
- Paste the request you copied before to nc 127.0.0.1 8182 terminal
- Copy the gremlin server response and paste into nc -l output
- The connection will be established and the nc -l will receive some unprintable chars corresponding to g.V().limit(1).toList()
- Now, if there is no response from nc -l process, the python code will hang forever.
Possible solution
As I looked into it, the problem seems that the TornadoTransport implementation does not pass any timeout when reading (and writing) messages. So, passing a timeout to self._loop.run_sync can solve the issue, at least raising an exception when the server does not respond.
If I change the example above:
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection from gremlin_python.driver.tornado.transport import TornadoTransport from gremlin_python.process.anonymous_traversal import traversal class CustomTornadoTransport(TornadoTransport): def read(self): return self._loop.run_sync(lambda: self._ws.read_message(), timeout=5) remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", "g", transport_factory=CustomTornadoTransport) g = traversal().withRemote(remote_connection) g.V().limit(1).toList()
and repeat the same steps, g.V().limit(1).toList() times out after not getting any response from the server for 5 seconds.
I'm not sure if there should be any timeout for writing, but it seems it should definitely be set for read operations.