Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-23384

Java thin: heartbeat timeout under load

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0
    • thin client

    Description

      When doing YCSB throughput benchmarks for 1 server node cluster, I noticed that the client often fails with a heartbeat timeout:

      2024-09-25 16:19:06:345 [PAYLOAD] 214 sec: 2410491 operations; 0 current ops/sec; est completion in 13 seconds   
      2024-09-25 16:19:07:345 [PAYLOAD] 215 sec: 2410491 operations; 0 current ops/sec; est completion in 13 seconds   
      2024-09-25 16:19:07:345 [PAYLOAD] 215 sec: 2410491 operations; 0 current ops/sec; est completion in 13 seconds   
      Sep 25, 2024 4:19:07 PM org.apache.ignite.internal.logger.IgniteLogger logInternal
      WARNING: Heartbeat timeout, closing the channel [remoteAddress=192.168.210.33:10800]
      Sep 25, 2024 4:19:07 PM org.apache.ignite.internal.logger.IgniteLogger logInternal
      INFO: The timeout worker was interrupted, probably the worker is stopping.
      2024-09-25 16:19:08:345 [PAYLOAD] 216 sec: 2410491 operations; 0 current ops/sec; est completion in 13 seconds   
      2024-09-25 16:19:08:345 [PAYLOAD] 216 sec: 2410491 operations; 0 current ops/sec; est completion in 13 seconds   
      2024-09-25 16:19:09:345 [PAYLOAD] 217 sec: 2410491 operations; 0 current ops/sec; est completion in 13 seconds 
      ...
      Sep 25, 2024 4:19:12 PM org.apache.ignite.internal.logger.IgniteLogger warn
      WARNING: Failed to establish connection to 192.168.210.33:10800: org.apache.ignite.client.IgniteClientConnectionException: IGN-CLIENT-1 TraceId:e8797794-e6f9-495d-bdd5-bf5639b8878e Handshake timeout [endpoint=192.168.210.33:10800]
      java.util.concurrent.CompletionException: org.apache.ignite.client.IgniteClientConnectionException: IGN-CLIENT-1 TraceId:e8797794-e6f9-495d-bdd5-bf5639b8878e Handshake timeout [endpoint=192.168.210.33:10800]
      	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
      	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
      	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:932)
      	at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
      	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
      	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
      	at org.apache.ignite.internal.future.timeout.TimeoutWorker.body(TimeoutWorker.java:96)
      	at org.apache.ignite.internal.util.worker.IgniteWorker.run(IgniteWorker.java:108)
      	at java.base/java.lang.Thread.run(Thread.java:829)
      Caused by: org.apache.ignite.client.IgniteClientConnectionException: IGN-CLIENT-1 TraceId:e8797794-e6f9-495d-bdd5-bf5639b8878e Handshake timeout [endpoint=192.168.210.33:10800]
      	at org.apache.ignite.internal.client.TcpClientChannel.lambda$handshakeAsync$7(TcpClientChannel.java:601)
      	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
      	... 6 more
      Caused by: java.util.concurrent.TimeoutException
      	... 3 more
      
      • We don't need heartbeats under load (they are only useful when idle)
      • If a heartbeat request was sent and timed out, but other responses arrived meanwhile, we can ignore the timeout

      Attachments

        Issue Links

          Activity

            People

              ptupitsyn Pavel Tupitsyn
              ptupitsyn Pavel Tupitsyn
              Igor Sapego Igor Sapego
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m