The suggestion given above can simply be avoided by taking a the actual online servers list after getting the logFolders. This will ensure that we donot split any new RS that has checked in.
In joinCluster(), as per the existing code if any new server has checked in and the root/meta had got assigned to it in joincluster we may think that it is an dead server because we alerady have passed the online servers. Hence we are trying to get the actual online list as per the patch.
The problem that you have mentioned here
if Regionserver A with startcode 001 is restarted, and then Regionserver A with startcode 002 is in the onlineServers, but Regionserver A with startcode 001 is in the process by SSH, not in the deadServers
This we are trying to avoid in our current v6 patch, by not remvoing from dead servers, any restarted server that is coming up during master initialization. Later after master initialization we try to clear the dead server which matches with the current online servers with same host name and port.
There are other problems during SSH and master initialization that may lead to double assignment or concurrent modification exception. These things we will address in a new JIRA.
Pls review the current patch and provide your suggestions.