We are using some Lucene indexes directly from HDFS and for quite long time we were using Hadoop version 0.15.3.
When tried to upgrade to Hadoop 0.19 - index searches started to fail with exceptions like:
2008-11-13 16:50:20,314 WARN [Listener-4]  DFSClient : DFS Read: java.io.IOException: Could not obtain block: blk_5604690829708125511_15489 file=/usr/collarity/data/urls-new/part-00000/20081110-163426/_0.tis
The investigation showed that the root of this issue is that we exceeded # of xcievers in the data nodes and that was fixed by changing configuration settings to 2k.
However - one thing that bothered me was that even after datanodes recovered from overload and most of client servers had been shut down - we still observed errors in the logs of running servers.
Further investigation showed that fix for
HADOOP-1911 introduced another problem - the DFSInputStream instance might become unusable once number of failures over lifetime of this instance exceeds configured threshold.
The fix for this specific issue seems to be trivial - just reset failure counter before reading next block (patch will be attached shortly).
This seems to be also related to HADOOP-3185, but I'm not sure I really understand necessity of keeping track of failed block accesses in the DFS client.