Affects Version/s: 3.4.6
Fix Version/s: None
Environment:Ubuntu 18.04, Flask 1.1.1, python 3.8.1, Amazon Neptune
In the context of a Flask application using multi threads, it is currently not possible to close the DriverRemoteConnection due to two issues. As our Flask application initiates a new connection on every new request (because we don't want the trouble of reusing the connections), the process eventually runs out of file descriptors.
Given a gremlin server running on 127.0.0.1:8182, this can reproduce the first error:
When a thread tries to execute remote_connection.close(), the following error happens:
This is caused by TornadoTransport.close() does not close the websocket in a loop.
I can fix that by providing my own transport to close the websocket with self._loop.run_sync(lambda: self._ws.close()):
Now, apparently the connection is closed successfully, but if we look at the open connections, we will find a bunch of tcp connections in CLOSE_WAIT state.
For example, using netstat on Linux while the script is still running:
Digging into the code, I found out that tornado does not terminate the connection right away. This is what happens when the websocket is closed:
- It sends a message to the server
- It schedules a 5s timer to abort the connection in case the client does not close it
- On the next IO loop iteration, it receives the client message for closing the connection
- As the client closed the connection cleanly, it cancels the timeout
So, for the websocket to properly close, the loop needs to run again for Tornado to receive the client's close message or for the timeout to call abort. But that never happens, because TornadoTransport.close also closes the loop, leaking those connections.
I don't know if that is the best solution, but reading a message from the socket after closing it, makes tornado receive the client message to close the connection:
Now, after running the script with the change above, netstat -nt4p | grep 8182 does not show any connections any more.