Affects Version/s: None
Fix Version/s: None
We copied hbase file data (root/meta/tables) from one hdfs cluster to another, scrubbed it, and then attempted to start the new cluster. We noticed that META on the original cluster was being modified with server entries from the new cluster.
Its contrived but here is how it happened.
First we copied all the data. Then we "scrubbed" META – we removed all region serverinfo cols that pointed to nodes on the original cluster. When we started the new cluster, it picked a RS to serve ROOT. Since we had scrubbed meta, then the new cluster's master attempted to assign regions to other region servers on the new cluster. From the code's point of view this all succeeeded – zk went through transitions, according to the master they were assigned. However, we started seeing NotServingRegionExceptions on the original cluster.
The root cause is that ROOT was not scrubbed. The new cluster assigned the copy of ROOT to a new cluster RS. Now, when the new cluster attempted to modify META, it would read the old ROOT's serverinfo pointer go to the old cluster's regionserver. The old cluster's regionserer just so happened to be still serving META, so the old cluster's META server gladly accepted the assignments that included the new cluster's regionserver names.
At this point we brought down the new cluster (it was getting killed). Clients on the old cluster would now go to zk,root,meta, and get pointers to the new cluster. NSRE's happened. Unhappyness.
Long story short, we should have some mechanism to make sure that region assignments should be only allowed edit META hosted on the same cluster.