i was looking at this patch, and while it has certainly reduced the chance of problems, isn't it still possible a new writer thread could be created
1. between the kill loop in startBlockRecovery() and the synchronized block
2. between the startBlockRecovery() call and updateBlock() call
I seem to recall reasoning with dhruba that while in theory these could occur from the DN perspective, the circumstances that would have to occur outside were not (once you fixed hdfs-1260 anyway, where genstamp checks work right in concurrent lease recovery).
what's your take on this? is it full-proof now? (1 & 2 can't happen) or what about introducing a state like RUR here? (at least disabling writes to a block while under recovery, maybe timing out in case the lease recovery owner dies)