If a Azure Cosmos Gremlin host goes down and if it becomes available again, the tinkerpop3 java driver cannot reconnect to it.
Sample code is available at https://github.com/cancure/tinkerpopcosmos
These are the steps to reproduce -
1) Start the application after providing proper connection details in remote.yaml file.
2) Call the end point POST http://localhost:8080/query. Pass a valid gremlin query as HTTP body E.g. g.V(1).id()
3) Disconnect computer from internet.
4) Do step 2.
5) Reconnect computer to the internet.
6) Do step 2.
Any query executed from this point onwards gets the error - "java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out while waiting for an available host - check the client configuration and connectivity to the server if this message persists".
The issue seems to be coming from org.apache.tinkerpop.gremlin.driver.ConnectionPool.java. In line #403, the gremlin query used for ping message is '' (Empty string in single quotes).
final RequestMessage ping = RequestMessage.build(Tokens.OPS_EVAL).add(Tokens.ARGS_GREMLIN, "''").create();
Cosmos server returns an error saying that the gremlin query's grammar is incorrect. Because of this error the rest of the lines of the method does not get executed.
Fix will be to use a gremlin query which is valid for all supported graph DBs.