There is a very simple "fix" to this, i.e. make the "failures" count an instance var on DFSInputStream rather than a local variable in chooseDataNode. This would make the semantics of MAX_BLOCK_ACQUIRE_FAILURES to be a cap on the number of total block acquisition failures for the life of the stream, which is not exactly correct, but it is a fix we could easily get into 0.17. It will yield false negatives for a particularly problematic stream, but for applications like distcp it should be sufficient.
After consulting with Dhruba, the longer-term fix will track failures not using a list of "deadnodes", but rather a map of blocks to a list of deadnodes and- to preserve the retry semantics- a map of blocks to full acquisition failures. Right now, a datanode that fails to serve a block is blacklisted on the stream until there are no replicas available for some block, when the list is cleared. The false negatives this yields require the existing, problematic retry semantics. After confirming this approach with Koji, I'll file a JIRA for the more correct fix and submit the sufficient one for 0.17