When client writing pipeline, such as Client=>DN1=>DN2=DN3.At one point, DN2 crashed, client will execute the recovery process. The error DN2 will be added into "failed". Client will apply a new DN from NN with "failed" and replace the DN2 in the pipeline, eg: Client=>DN1=>DN4=>DN3.
This Client running....
After a long time, client is still writing data for the file. Of course, there are many pipelines. eg. Client => DN-1 => DN-2 => DN-3.
When DN-2 crashed, error DN-2 will be added into "failed", client will execute the recovery process as before. It will get a new DN from NN with the "failed", and NN will select one DN from all DNs exclude "failed", even if DN-2 has restarted.
Why not remove DN2(started) from "failed"??
Why is the collection of error nodes in the recovery process Shared with the get next Block.such as
private final List<DatanodeInfo> failed = new ArrayList<>();
private final LoadingCache<DatanodeInfo, DatanodeInfo> excludedNodes;
As Before, when DN2 crashed, client will recovery the pipeline after timeout(default worst need 490s). When the client finished writing this block and apply the next block, NN maybe return the block which contains the error data node 'DN2'. When client will create a new pipeline for the new block, client will has to go through a connection timeout(default need 60s).
If "failed" and "excludedNodes" is one collection, it will avoid the connection timeout. Because "excludedNodes" is dynamically deleted, it also avoid the first problem.