Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26029

It is not reliable to use nodeDeleted event to track region server's death

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed
    • Hide
      Introduce a new step in ServerCrashProcedure to move the replication queues of the dead region server to other live region servers, as this is the only reliable way to get the death event of a region server.
      The old ReplicationTracker related code have all been purged as they are not used any more.
      Show
      Introduce a new step in ServerCrashProcedure to move the replication queues of the dead region server to other live region servers, as this is the only reliable way to get the death event of a region server. The old ReplicationTracker related code have all been purged as they are not used any more.

    Description

      When implementing HBASE-26011, Xin Sun pointed out an interesting scenario, where a region server up and down between two sync requests, then we can not know the death of the region server.

      https://github.com/apache/hbase/pull/3405#discussion_r656720923

      This is a valid point, and when thinking of a solution, I noticed that, the current zk iplementation has the same problem. Notice that, a watcher on zk can only be triggered once, so after zk triggers the watcher, and before you set a new watcher, it is possible that a region server is up and down, and you will miss the nodeDeleted event for this region server.

      I think, the general approach here, which could works for both master based and zk based replication tracker is that, we should not rely on the tracker to tell you which region server is dead. Instead, we just provide the list of live regionservers, and the upper layer should compare this list with the expected list(for replication, the list should be gotten by listing replicators), to detect the dead region servers.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zhangduo Duo Zhang
            zhangduo Duo Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment