Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
If work preserving recover is enabled the NM will not start up if the state store does not initialise. However if the state store becomes unavailable after that for any reason the NM will not go unhealthy.
Since the state store is not available new containers can not be started any more and the NM should become unhealthy:
AMLauncher: Error launching appattempt_1508806289867_268617_000001. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: java.io.IOException: org.iq80.leveldb.DBException: IO error: /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: Read-only file system at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721) ... Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: Read-only file system at o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374) at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848) at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712)