Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
None
-
None
-
None
Description
When trying to use REEF with Federation, there's a problem on the node reports YARN sends us.
Just after initializing our yarn client library (hadoop-yarn-client-2.4.0), we ask for the RUNNING nodes in the cluster to populate our own Resource Catalog.
YARN replies with the nodes that belong to a 'random' sub-cluster; sometimes with the nodes in the correct sub-cluster (where the containers will be placed), and sometimes with other ones. That causes the application to randomly fail.
For example, we populate our resource catalog with nodes in sub-cluster 1, but the allocations are actually made on sub-cluster 2, so we fail.
We need to do a work around for this issue, as YARN folks are not sure when they will have the right.
Attachments
Issue Links
- depends upon
-
REEF-337 Support REEF on YARN Federation
- Open
- is cloned by
-
REEF-589 REEF crashes when new nodes are added to the clusters dynamically
- Resolved
- is duplicated by
-
REEF-589 REEF crashes when new nodes are added to the clusters dynamically
- Resolved
- is related to
-
YARN-2915 Enable YARN RM scale out via federation using multiple RM's
- Resolved