On the 'post_region_open', IIRC, a bunch of these tickleOpenings were added because we saw issues... in this case, an update of .META. that went in though we'd lost ownership of the region.
Implicitly, it means we still have a race condition here, just that the probability is quite low.
Stepping back (after looking at code), could we drop the notion that a master can intercede and assign a region elsewhere because it is proceeding too slow on a particular region in the name of simplifying the region open handling interaction? There would be less noise in the logs and less states to deal with.
It would be a huge simplification imho. It's worth trying, I would say. It actually makes sense to do it now, because once the current trunk code will be production proven, touching it will be scarier.
Do we fail the open if the following break happens?
Yes, because updateMeta will return false.
We have to call the + zkw.sync(node); ? We always did that? We are doing the sync just to read the old znode value? Do we have to? Could we operate w/ stale read?
Well, we want to be sure that no one else wrote anything. It's often overkill, because we're going to write the znode immediately after, so the sync will occur anyway during the write, as we check the versions during the write. And it's expensive. So there is likely some room for improvement here as well actually.
HRegion region = this.rsServices.getFromOnlineRegions(encodedName);
We don't do anything here actually. We do a get, if the region is in the list, 'region' will not be null and that's it. This variable is set later but not read in between. I can raise an error if it's in the list? Hopefully it won't break anything (famous last words)