Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
As part of looking at using Kerberos, we want to avoid the case where both the primary (and optional secondary) KDC go offline causing a replication storm as the DataNodes' service tickets time out and they lose the ability to connect to the NameNode. However, this is a specific case of a more general problem of loosing too many nodes too quickly. I think we should have an option to go into safe mode if the cluster size goes down more than N% in terms of DataNodes.
Attachments
Issue Links
- is related to
-
HDFS-1392 Improve namenode scalability by prioritizing datanode heartbeats over block reports
- Resolved
-
ZOOKEEPER-702 GSoC 2010: Failure Detector Model
- Open