Here is what I found during analysis of an issue. Raising this jira and a fix will follow.
The TL;DR of this is that the AssignmentManager thinks the ServerShutdownHandler would assign the region and the ServerShutdownHandler thinks that the AssignmentManager would assign the region. The region (0d6cf37c18c54c6f4744750c6a7be837) ultimately never gets assigned. Below is an analysis from the logs that captures the flow of events.
1. The AssignmentManager had initially assigned this region to dnj1-bcpc-r3n8.example.com,60020,1425598187703
2. When the master restarted it did a scan of the meta to learn about the regions in the cluster. It found this region being assigned to dnj1-bcpc-r3n8.example.com,60020,1425598187703 from the meta record.
3. However, this server (dnj1-bcpc-r3n8.example.com,60020,1425598187703) was not alive anymore. So, the AssignmentManager queued up a ServerShutdownHandling task for this (that asynchronously executes):
4. The AssignmentManager proceeded to read the RIT nodes from ZK. It found this region as well:
5. The region was moved to CLOSED state:
Note the reference to dnj1-bcpc-r3n2.example.com,60020,1425603618259. This means that the region was assigned to dnj1-bcpc-r3n2.example.com,60020,1425603618259 but that regionserver couldn't open the region for some reason, and it changed the state to RS_ZK_REGION_FAILED_OPEN in RIT znode on ZK.
6. After that the AssignmentManager tried to assign it again. However, the assignment didn't happen because the ServerShutdownHandling task queued earlier didn't yet execute:
7. Eventually the ServerShutdownHandling task executed.
8. However, the ServerShutdownHandling task skipped the region in question. This was because this region was in RIT, and the ServerShutdownHandling task thinks that the AssignmentManager would assign it as part of handling the RIT nodes:
9. At some point in the future, when the server dnj1-bcpc-r3n2.example.com,60020,1425603618259 dies, the ServerShutdownHandling for it gets queued up (from the log hbase-hbase-master-dnj1-bcpc-r3n1.log):
10. In RegionStates.java:serverOffline, there is a check that happens on the state of the region's state. Since the region is in CLOSED state, the log is displayed: