Description
There is a race condition in gremlin-python when closing session-based connections that results in leaking file descriptors for event loops - eventually leading to an `OSError [Errno 24] too many open files` error after enough transactions occur.
The problem stems from a race condition when closing session based connections that causes the event loop opened for the session's connection to be left open.
The problem is completely contained in these two methods from `gremlin_python.driver.client.py`
```py
def close(self):
# prevent the Client from being closed more than once. it raises errors if new jobby jobs
# get submitted to the executor when it is shutdown
if self._closed:
return
if self._session_enabled:
self._close_session() # 1. (see below)
log.info("Closing Client with url '%s'", self._url)
while not self._pool.empty(): # 3. (see below)
conn = self._pool.get(True)
conn.close()
self._executor.shutdown()
self._closed = True
def _close_session(self):
message = request.RequestMessage(
processor='session', op='close',
args={'session': str(self._session)})
conn = self._pool.get(True)
return conn.write(message).result() # 2. (see below)
```
1. `_close_session()` called
2. `.result()` waits for the write to finish, but does not wait for the read to finish. `conn` does not get put back into `self._pool` until AFTER the read finishes (`gremlin_python.driver.connection.Connection._receive()`). However, this method returns early and goes to 3.
3. this while loop is not entered to close out the connections. This leaves the conn's event loop running, never to be closed.
I was able to solve this by modifying `_close_session` as follows:
```py
def _close_session(self):
message = request.RequestMessage(
processor='session', op='close',
args={'session': str(self._session)})
conn = self._pool.get(True)
try:
write_result_set = conn.write(message).result()
return write_result_set.all().result() # wait for _receive() to finish
except protocol.GremlinServerError:
pass
```
I'm not sure if this is the correct solution, but wanted to point out the bug.
In the meantime however, I wrote a context manager to handle this cleanup for me
```py
@contextlib.contextmanager
def transaction():
tx = g.tx()
gtx = tx.begin()
try:
yield tx, gtx
tx.commit()
except Exception as e:
tx.rollback()
finally:
while not tx._session_based_connection._client._pool.empty():
conn = tx._session_based_connection._client._pool.get(True)
conn.close()
logger.info("Closed abandoned session connection")
with transaction() as (tx, gtx):
foo = gtx.some_traversal().to_list()
# do something with foo
gtx.some_other_traversal().iterate()
```
Cheers