Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
Container exit status from NM is giving indications of reasons for its failure. Some times, it may be because of container launching problems in NM. In a heterogeneous cluster, some machines with weak hardware may cause more failures. It will be better not to launch AMs there more often. Also I would like to clear that container failures because of buggy job should not result in decreasing score.
As mentioned earlier, based on exit status if a scoring mechanism is added for NMs in RM, then NMs with better scores can be given for launching AMs. Thoughts?
Attachments
Issue Links
- is duplicated by
-
YARN-2005 Blacklisting support for scheduling AMs
- Resolved