Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2486

Add simple "anti-entropy" for region assignment

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Incomplete
    • 0.20.5
    • None
    • master, regionserver

    Description

      We've seen a number of bugs where a region server thinks it should not be serving a region, but the master and META think it should be. I'd like to propose a very simple way of fixing this issue:

      1) whenever a regionserver throws a NotServingRegionException, it also marks that region id in an RS-wide Set
      2) when a region sends a heartbeat, include a message for each of these regions, MSG_REPORT_NSRE or somesuch, and then clear the set
      3) when the master receives MSG_REPORT_NSRE, it does the following checks:
      a) if the region is assigned elsewhere according to META, the NSRE was due to a stale client, ignore
      b) if the region is in transition, ignore
      c) otherwise, we have an inconsistency, and we should take some steps to resolve (eg mark the region unassigned, or exit the master if we are in "paranoid mode")

      Whatever we do, we need to make sure that this is loudly logged, and causes unit tests to fail, when it's detected. This should not happen, but when it does, it would be good to recover without addtable.rb, etc.

      Attachments

        1. hbase2486.diff
          12 kB
          Eugene Koontz
        2. hbase2486.diff
          11 kB
          Eugene Koontz

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment