Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.0.2-alpha, 3.0.0-alpha1
-
None
-
None
Description
When a DN marks a disk as bad, it stops using replicas on that disk.
However a long-running BlockReaderLocal instance will continue to access replicas on the failing disk.
Somehow we should let the in-client BlockReaderLocal know that a disk has been marked as bad so that it can stop reading from the bad disk.
From HDFS-4239:
To rephrase that, a long running BlockReaderLocal will ride over local DN restarts and disk "ejections". We had to drain the RS of all its regions in order to stop it from using the bad disk.