Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.0.2-alpha, 0.23.5
-
None
-
None
Description
When a local disk becomes full, the node will fail every container launched on it because the container is unable to localize. It tries to create an app-specific directory for each local and log directories. If any of those directory creates fail (due to lack of free space) the container fails.
It would be nice if the node could continue to launch containers using the space available on other disks rather than failing all containers trying to launch on the node.
This is somewhat related to YARN-91 but is centered around the disk becoming full rather than the disk failing.
Attachments
Issue Links
- blocks
-
YARN-414 [Umbrella] Usability issues in YARN
- Open
- duplicates
-
YARN-778 Failures in container launches due to issues like disk failure are difficult to diagnose
- Resolved
- is duplicated by
-
YARN-1091 All containers localization fails in NM when any one of the configured nm local-dir disk becomes full
- Resolved
-
YARN-1777 Nodemanager fails to detect Full disk and try to launch container
- Resolved
- is related to
-
YARN-522 [Umbrella] Better reporting for crashed/Killed AMs and Containers
- Open
-
YARN-91 DFIP aka 'NodeManager should handle Disk-Failures In Place'
- Open