HBase
  1. HBase
  2. HBASE-6348

Region assignments should be only allowed edit META hosted on the same cluster.

    Details

    • Type: Task Task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We copied hbase file data (root/meta/tables) from one hdfs cluster to another, scrubbed it, and then attempted to start the new cluster. We noticed that META on the original cluster was being modified with server entries from the new cluster.

      Its contrived but here is how it happened.

      First we copied all the data. Then we "scrubbed" META – we removed all region serverinfo cols that pointed to nodes on the original cluster. When we started the new cluster, it picked a RS to serve ROOT. Since we had scrubbed meta, then the new cluster's master attempted to assign regions to other region servers on the new cluster. From the code's point of view this all succeeeded – zk went through transitions, according to the master they were assigned. However, we started seeing NotServingRegionExceptions on the original cluster.

      The root cause is that ROOT was not scrubbed. The new cluster assigned the copy of ROOT to a new cluster RS. Now, when the new cluster attempted to modify META, it would read the old ROOT's serverinfo pointer go to the old cluster's regionserver. The old cluster's regionserer just so happened to be still serving META, so the old cluster's META server gladly accepted the assignments that included the new cluster's regionserver names.

      At this point we brought down the new cluster (it was getting killed). Clients on the old cluster would now go to zk,root,meta, and get pointers to the new cluster. NSRE's happened. Unhappyness.

      Long story short, we should have some mechanism to make sure that region assignments should be only allowed edit META hosted on the same cluster.

        Activity

        Hide
        Jean-Daniel Cryans added a comment -

        Wouldn't it be just easier to make sure that .META. is assigned correctly? IIUC this is where the problem happened (HMaster.assignRootAndMeta):

            if (!this.catalogTracker.verifyMetaRegionLocation(timeout)) {
              ServerName currentMetaServer =
                this.catalogTracker.getMetaLocationOrReadLocationFromRoot();
              if (currentMetaServer != null
                  && !currentMetaServer.equals(currentRootServer)) {
                splitLogAndExpireIfOnline(currentMetaServer);
              }
              assignmentManager.assignMeta();
              this.catalogTracker.waitForMeta();
              // Above check waits for general meta availability but this does not
              // guarantee that the transition has completed
              this.assignmentManager.waitForAssignment(HRegionInfo.FIRST_META_REGIONINFO);
              assigned++;
            } else {
              // Region already assigned.  We didnt' assign it.  Add to in-memory state.
              this.assignmentManager.regionOnline(HRegionInfo.FIRST_META_REGIONINFO,
                this.catalogTracker.getMetaLocation());
            }
        

        When the location was verified, it was able to read the old .META. location from ROOT and since the region was still there it was assumed that .META. was correctly assigned. Now what's interesting is this from AM.regionOnline:

              if (isServerOnline(sn)) {
                this.regions.put(regionInfo, sn);
                addToServers(sn, regionInfo);
                this.regions.notifyAll();
              } else {
                LOG.info("The server is not in online servers, ServerName=" + 
                  sn.getServerName() + ", region=" + regionInfo.getEncodedName());
              }
        

        I assume that if you went over the master's log you would find the log message about the server not being online? It seems to me that we should either check if the server belongs to us or backtrack when we fail to setting the region online.

        Show
        Jean-Daniel Cryans added a comment - Wouldn't it be just easier to make sure that .META. is assigned correctly? IIUC this is where the problem happened (HMaster.assignRootAndMeta): if (! this .catalogTracker.verifyMetaRegionLocation(timeout)) { ServerName currentMetaServer = this .catalogTracker.getMetaLocationOrReadLocationFromRoot(); if (currentMetaServer != null && !currentMetaServer.equals(currentRootServer)) { splitLogAndExpireIfOnline(currentMetaServer); } assignmentManager.assignMeta(); this .catalogTracker.waitForMeta(); // Above check waits for general meta availability but this does not // guarantee that the transition has completed this .assignmentManager.waitForAssignment(HRegionInfo.FIRST_META_REGIONINFO); assigned++; } else { // Region already assigned. We didnt' assign it. Add to in-memory state. this .assignmentManager.regionOnline(HRegionInfo.FIRST_META_REGIONINFO, this .catalogTracker.getMetaLocation()); } When the location was verified, it was able to read the old .META. location from ROOT and since the region was still there it was assumed that .META. was correctly assigned. Now what's interesting is this from AM.regionOnline: if (isServerOnline(sn)) { this .regions.put(regionInfo, sn); addToServers(sn, regionInfo); this .regions.notifyAll(); } else { LOG.info( "The server is not in online servers, ServerName=" + sn.getServerName() + ", region=" + regionInfo.getEncodedName()); } I assume that if you went over the master's log you would find the log message about the server not being online? It seems to me that we should either check if the server belongs to us or backtrack when we fail to setting the region online.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jonathan Hsieh
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:

              Development