Details

    • Reviewed

    Description

      HBASE-13172 is supposed to fix a UT issue, but causes other problems that parent jira (HBASE-13605) is attempting to fix.

      However, HBASE-13605 patch v4 uncovers at least 2 different issues which are, to put it mildly, major design flaws in AM / RS.

      Regardless of 13605, the issue with 13172 is that we catch ServerNotRunningYetException from isServerReachable() and return false, which then puts the Server to the RegionStates.deadServers list. Once it is in that list, we can still assign and unassign regions to the RS after it has started (because regular assignment does not check whether the server is in RegionStates.deadServers. However, after the first assign and unassign, we cannot assign the region again since then the check for the lastServer will think that the server is dead.

      It turns out that a proper patch for 13605 is very hard without fixing rest of broken AM assumptions (see HBASE-13605, HBASE-13877 and HBASE-13895 for a colorful history). For 1.1.1, I think we should just revert parts of HBASE-13172 for now.

      Attachments

        1. hbase-13937_v1.patch
          2 kB
          Enis Soztutar
        2. hbase-13937_v2.patch
          3 kB
          Enis Soztutar
        3. hbase-13937_v3.patch
          2 kB
          Enis Soztutar
        4. hbase-13937_v3.patch
          2 kB
          Enis Soztutar
        5. hbase-13937_v3-branch-1.1.patch
          2 kB
          Enis Soztutar

        Activity

          People

            enis Enis Soztutar
            enis Enis Soztutar
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: