Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Done
-
3.1.0-incubating
-
None
Description
If a gremlin-server is restarted, the client will never reconnect to it.
Start server1
Start server2
Start client such as
GryoMapper kryo = GryoMapper.build().addRegistry(TitanIoRegistry.INSTANCE).create(); MessageSerializer serializer = new GryoMessageSerializerV1d0(kryo); Cluster titanCluster = Cluster.build() .addContactPoints("54.X.X.X,54.Y.Y.Y".split(",")) .port(8182) .minConnectionPoolSize(5) .maxConnectionPoolSize(10) .reconnectIntialDelay(1000) .reconnectInterval(30000) .serializer(serializer) .create(); Client client = titanCluster.connect(); client.init(); System.out.println("initialized"); for (int i = 0; i < 200; i++) { try { long id = System.currentTimeMillis(); ResultSet results = client.submit("graph.addVertex('a','" + id + "')"); results.one(); results = client.submit("g.V().has('a','" + id + "')"); System.out.println(results.one()); } catch (Exception e) { e.printStackTrace(); } try { TimeUnit.SECONDS.sleep(3); } catch (InterruptedException e) { e.printStackTrace(); } } System.out.println("done"); client.close(); System.exit(0); }
After client has performed a couple of query cycles
Restart server1
Wait 60 seconds so the reconnect should occur
stop server2
Notice that there are no more successful queries, the client has never reconnected to server1
start server2
Notice that still there are no more successful queries
The method ConnectionPool.addConnectionIfUnderMaximum is always returning false because opened >= maxPoolSize. In this particular case opened = 10. I believe that open is trying to track the size of the List of connections but is getting out of sync. The following diff addresses this problem for this particular case
diff --git a/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java b/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java index 96c151c..81ce81d 100644 --- a/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java +++ b/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/ConnectionPool.java @@ -326,6 +326,7 @@ final class ConnectionPool { private void definitelyDestroyConnection(final Connection connection) { bin.add(connection); connections.remove(connection); + open.decrementAndGet(); if (connection.borrowed.get() == 0 && bin.remove(connection)) connection.closeAsync(); @@ -388,6 +389,8 @@ final class ConnectionPool { // if the host is unavailable then we should release the connections connections.forEach(this::definitelyDestroyConnection); + // there are no connections open + open.set(0); // let the load-balancer know that the host is acting poorly this.cluster.loadBalancingStrategy().onUnavailable(host); @@ -413,6 +416,7 @@ final class ConnectionPool { this.cluster.loadBalancingStrategy().onAvailable(host); return true; } catch (Exception ex) { + logger.debug("Failed reconnect attempt on {}", host); if (connection != null) definitelyDestroyConnection(connection); return false; }
Attachments
Issue Links
- is duplicated by
-
TINKERPOP-1079 Initialized client cannot reconnect to gremlin server
- Closed
- is related to
-
TINKERPOP-1125 RoundRobin load balancing always uses the second Host when size = 2
- Closed