Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-878

Infinite restart of GrpcLogAppender

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • gRPC
    • None

    Description

      What's the problem ?

      When run hadoop-ozone for 4 days, datanode memory leak.  When dump heap, I found there are 460710 instances of GrpcLogAppender. But there are only 6 instances of SenderList, and each SenderList contains 1-2 instance of GrpcLogAppender. And there are a lot of logs related to LeaderState::restartSender.

      INFO impl.RaftServerImpl: 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState: Restarting GrpcLogAppender for 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}

      So there are a lot of GrpcLogAppender did not stop the Daemon Thread when removed from senders.


       
      Why LeaderState::restartSender so many times ?
      1. As the image shows, when remove group, SegmentedRaftLog will close, then GrpcLogAppender throw exception when find the SegmentedRaftLog was closed. Then GrpcLogAppender will be restarted, and the new GrpcLogAppender throw exception again when find the SegmentedRaftLog was closed, then GrpcLogAppender will be restarted again ... . It results in an infinite restart of GrpcLogAppender.
      2. Actually, when remove group, GrpcLogAppender will be stoped: RaftServerImpl::shutdown -> RoleInfo::shutdownLeaderState -> LeaderState::stop -> LogAppender::stopAppender, then SegmentedRaftLog will be closed: RaftServerImpl::shutdown -> ServerState:close ... . Though RoleInfo::shutdownLeaderState called before ServerState:close, but the GrpcLogAppender was stopped asynchronously. So infinite restart of GrpcLogAppender happens, when GrpcLogAppender stop after SegmentedRaftLog close.

      More details please refer it here RATIS-840.

      Attachments

        1. screenshot-4.png
          429 kB
          runzhiwang
        2. screenshot-3.png
          28 kB
          runzhiwang
        3. screenshot-2.png
          107 kB
          runzhiwang
        4. screenshot-1.png
          107 kB
          runzhiwang

        Issue Links

          Activity

            People

              duongnguyen Duong
              yjxxtd runzhiwang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: