HConnection#getRegionLocation(table, row, reload=true) and HConnection#getRegionLocation(table, row, reload=false) are not equivalent when the cache is empty: the first will check the table status while the second will not.
As a consequence, the client won't have the same exception if the table is disabled. With reload==true, we will have a DoNotRetryIOException, with a message saying that the table is disabled. With reload==false we will have a NotServingException. It's quite difficult to guess, as it's not mentioned in the javadoc.
Second effect is that the client is going to ZooKeeper to check this table state. In ServerCallable, if the first try is not successful, we will then go all the time to ZK to check this status. So if a region server stops, all its clients will connect to ZK, possibly multiple time if the recovery takes some time. With a few hundreds clients, it's not very nice...
I'm not sure of the solution. A possible improvement in ServerCallable would be to do a reload only at the first retry instead of all of them, but:
- it's not without side effects, even if it's limited
- the real cost is the first try, as it may creates a ZK connection.
Another thing to do would be to limit the reload to the case it makes sense. In locateRegionInMeta there is a test on the exception:(e instanceof RegionOfflineException || e instanceof NoServerForRegionException).
May be this logic could be put in ServerCallable as well, but we need to cover all cases.