Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-534

Warn that the netty useCacheForAllThreads feature may cause JVM GC pauses

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.4.0
    • gRPC
    • None

    Description

      Due to JVM GC pauses a follower is not able to process the append entries and as a result turns into a candidate. This leads to leader reelection. Below are the logs illustrating the scenario.

      Below are the leader ratis logs. The leader sends heartbeat to follower after every 500ms.

      2019-04-23 18:34:46,360 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 -> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2, previousLog=term: 2
      index: 211601
      , leaderCommit=211601, commits: [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601, 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601, eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
      2019-04-23 18:34:46,360 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 -> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
      2019-04-23 18:34:46,862 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 -> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2, previousLog=term: 2
      index: 211601
      , leaderCommit=211601, commits: [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601, 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601, eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
      2019-04-23 18:34:46,862 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 -> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
      2019-04-23 18:34:47,362 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 -> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2, previousLog=term: 2
      index: 211601
      , leaderCommit=211601, commits: [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601, 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601, eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
      2019-04-23 18:34:47,362 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 -> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
      2019-04-23 18:34:47,862 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 -> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2, previousLog=term: 2
      index: 211601
      , leaderCommit=211601, commits: [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601, 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601, eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
      2019-04-23 18:34:47,863 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 -> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
      2019-04-23 18:34:48,363 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 -> eaa2a877-0e7f-44f3-a514-12243ce2b925) appendRequest=leaderTerm=2, previousLog=term: 2
      index: 211601
      , leaderCommit=211601, commits: [5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601, 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601, eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
      2019-04-23 18:34:48,363 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:group-80F1E6232C38 APPENDER GrpcLogAppender(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16 -> eaa2a877-0e7f-44f3-a514-12243ce2b925) wait 500ms
      

      Below are the follower logs showing a GC pause.

      2019-04-23 18:34:46,361 INFO org.apache.ratis.server.impl.RaftServerImpl: eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 HEARTBEAT: eaa2a877-0e7f-44f3-a514-12243ce2b925: receive appendEntries(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16, 2, (t:2, i:211601), 211601, false, commits[5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601, 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601, eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
      2019-04-23 18:34:46,362 INFO org.apache.ratis.server.impl.RaftServerImpl: eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 HEARTBEAT: eaa2a877-0e7f-44f3-a514-12243ce2b925: succeeded to handle AppendEntries. Reply: 5c400f35-d7ba-4911-8cd3-7ceb48e2ef16->eaa2a877-0e7f-44f3-a514-12243ce2b925,true,SUCCESS,nextIndex:211602,term:2,followerCommit:211601
      2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.FollowerState: eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 changes to CANDIDATE, lastRpcTime:2221, electionTimeout:1124ms
      2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RoleInfo: eaa2a877-0e7f-44f3-a514-12243ce2b925: shutdown FollowerState
      2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RaftServerImpl: eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 changes role from FOLLOWER to CANDIDATE at term 2 for changeToCandidate
      2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RaftServerImpl: eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 HEARTBEAT: eaa2a877-0e7f-44f3-a514-12243ce2b925: receive appendEntries(5c400f35-d7ba-4911-8cd3-7ceb48e2ef16, 2, (t:2, i:211601), 211601, false, commits[5c400f35-d7ba-4911-8cd3-7ceb48e2ef16:c211601, 187c2006-dc9e-40c7-b3e7-ca32b0c3ef45:c211601, eaa2a877-0e7f-44f3-a514-12243ce2b925:c211601], entries: []
      2019-04-23 18:34:48,583 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1774ms
      GC pool 'PS MarkSweep' had collection(s): count=1 time=1359ms
      GC pool 'PS Scavenge' had collection(s): count=1 time=532ms
      2019-04-23 18:34:48,583 INFO org.apache.ratis.server.impl.RoleInfo: eaa2a877-0e7f-44f3-a514-12243ce2b925: start LeaderElection
      2019-04-23 18:34:48,598 INFO org.apache.ratis.server.impl.RaftServerImpl: eaa2a877-0e7f-44f3-a514-12243ce2b925:group-80F1E6232C38 changes role from CANDIDATE to FOLLOWER at term 2 for appendEntries
      2019-04-23 18:34:48,598 INFO org.apache.ratis.server.impl.RoleInfo: eaa2a877-0e7f-44f3-a514-12243ce2b925: shutdown LeaderElection
      

      There is a GC pause of approx. 2 seconds which causes the leader election to start in the follower.

      Attachments

        1. RATIS-534.001.patch
          5 kB
          Lokesh Jain
        2. RATIS-534.002.patch
          3 kB
          Lokesh Jain
        3. RATIS-534.003.patch
          2 kB
          Lokesh Jain
        4. RATIS-534.004.patch
          2 kB
          Lokesh Jain

        Issue Links

          Activity

            People

              ljain Lokesh Jain
              ljain Lokesh Jain
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: