Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-2569

Reconnect to server if Java driver fails to initialize

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.4.11
    • Fix Version/s: None
    • Component/s: driver
    • Labels:
      None

      Description

      As reported here on SO: https://stackoverflow.com/questions/67586427/how-to-recover-with-a-retry-from-gremlin-nohostavailableexception

      If the host is unavailable at Client initialization then the host is not put in a state where reconnect is possible. Essentially, this test for GremlinServerIntegrateTest should pass:

      @Test
          public void shouldFailOnInitiallyDeadHost() throws Exception {
      
              // start test with no server
              this.stopServer();
      
              final Cluster cluster = TestClientFactory.build().create();
              final Client client = cluster.connect();
      
              try {
                  // try to re-issue a request now that the server is down
                  client.submit("g").all().get(3000, TimeUnit.MILLISECONDS);
                  fail("Should throw an exception.");
              } catch (RuntimeException re) {
                  // Client would have no active connections to the host, hence it would encounter a timeout
                  // trying to find an alive connection to the host.
                  assertThat(re.getCause(), instanceOf(NoHostAvailableException.class));
      
                  //
                  // should recover when the server comes back
                  //
      
                  // restart server
                  this.startServer();
      
                  // try a bunch of times to reconnect. on slower systems this may simply take longer...looking at you travis
                  for (int ix = 1; ix < 11; ix++) {
                      // the retry interval is 1 second, wait a bit longer
                      TimeUnit.SECONDS.sleep(5);
      
                      try {
                          final List<Result> results = client.submit("1+1").all().get(3000, TimeUnit.MILLISECONDS);
                          assertEquals(1, results.size());
                          assertEquals(2, results.get(0).getInt());
                      } catch (Exception ex) {
                          if (ix == 10)
                              fail("Should have eventually succeeded");
                      }
                  }
              } finally {
                  cluster.close();
              }
          }
      

      Note that there is a similar test that first allows a connect to a host and then kills it and then restarts it again called shouldFailOnDeadHost() which demonstrates that reconnection works in that situation.

      I thought it might be an easy to fix to simply call considerHostUnavailable() in the ConnectionPool constructor in the event of a CompletionException which should kickstart the reconnect process. The reconnects started firing but they all failed for some reason. I didn't have time to investigate further than than.

      Currently the only workaround is to recreate the `Client` if this sort of situation occurs.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              spmallette Stephen Mallette
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: