Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
Description
It is observed in the production cluster that RM fail to become active and keep continuously switching if the HDFS is too busy and node label is configured. This is causing RM down time as very high.
Exception from RM logs
Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/mapred/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1). There are 7 datanode(s) running and no node(s) are excluded in this operation.
Attachments
Issue Links
- relates to
-
YARN-4948 Support node labels store in zookeeper
- Resolved