Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14802

Replaying server crash recovery procedure after a failover causes incorrect handling of deadservers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.0, 1.2.1, 2.0.0
    • 1.2.0, 1.3.0, 2.0.0
    • master
    • None
    • Reviewed

    Description

      The way dead servers are processed is that a ServerCrashProcedure is launched for a server after it is added to the dead servers list.
      Every time a server is added to the dead list, a counter "numProcessing" is incremented and it is decremented when a crash recovery procedure finishes. Since, adding a dead server and recovering it are two separate events, it can cause inconsistencies.

      If a master failover occurs in the middle of the crash recovery, the numProcessing counter resets but the ServerCrashProcedure is replayed by the new master. This causes the counter to go negative and makes the master think that dead servers are still in process of recovery.
      This has ramifications on the balancer that the balancer ceases to run after such a failover.

      Attachments

        1. 14802.addendum.branch-1.txt
          2 kB
          Michael Stack
        2. HBASE-14802.patch
          7 kB
          Ashu Pachauri
        3. HBASE-14802-1.patch
          6 kB
          Ashu Pachauri
        4. HBASE-14802-2.patch
          7 kB
          Ashu Pachauri
        5. HBASE-14802-3.patch
          7 kB
          Ashu Pachauri
        6. HBASE-14802-4.patch
          7 kB
          Ashu Pachauri

        Activity

          People

            ashu210890 Ashu Pachauri
            ashu210890 Ashu Pachauri
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: