[REEF-568] Work around the federated YARN node reports problem - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: 0.13
Component/s: None
Labels:
None

Description

When trying to use REEF with Federation, there's a problem on the node reports YARN sends us.
Just after initializing our yarn client library (hadoop-yarn-client-2.4.0), we ask for the RUNNING nodes in the cluster to populate our own Resource Catalog.
YARN replies with the nodes that belong to a 'random' sub-cluster; sometimes with the nodes in the correct sub-cluster (where the containers will be placed), and sometimes with other ones. That causes the application to randomly fail.
For example, we populate our resource catalog with nodes in sub-cluster 1, but the allocations are actually made on sub-cluster 2, so we fail.

We need to do a work around for this issue, as YARN folks are not sure when they will have the right.

Attachments

Issue Links

depends upon

REEF-337 Support REEF on YARN Federation

Open

is cloned by

REEF-589 REEF crashes when new nodes are added to the clusters dynamically

Resolved

is duplicated by

REEF-589 REEF crashes when new nodes are added to the clusters dynamically

Resolved

is related to

YARN-2915 Enable YARN RM scale out via federation using multiple RM's

Resolved

Activity

People

Assignee:: Ignacio Cano

Reporter:: Ignacio Cano

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 06/Aug/15 21:42

Updated:: 13/Aug/15 16:38

Resolved:: 13/Aug/15 16:38