[FLINK-5542] YARN client incorrectly uses local YARN config to check vcore capacity - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.1.4, 1.5.3, 1.6.0, 1.7.0
Fix Version/s: 1.5.5, 1.6.2, 1.7.0
Component/s: Deployment / YARN
Labels:
- pull-request-available

Description

See http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/1-1-4-on-YARN-vcores-change-td11016.html

When using bin/yarn-session.sh, AbstractYarnClusterDescriptor line 271 in 1.1.4 is comparing the user's selected number of vcores to the vcores configured in the local node's YARN config (from YarnConfiguration eg. yarn-site.xml and yarn-default.xml). It incorrectly prevents Flink from launching even if there is sufficient vcore capacity on the cluster.

That is not correct, because the application will not necessarily run on the local node. For example, if running the yarn-session.sh client from the AWS EMR master node, the vcore count there may be different from the vcore count on the core nodes where Flink will actually run.

A reasonable way to fix this would probably be to reuse the logic from "yarn-session.sh -q" (FlinkYarnSessionCli line 550) which knows how to get vcore information from the real worker nodes. Alternatively, perhaps we could remove the check entirely and rely on YARN's Scheduler to determine whether sufficient resources exist.

Attachments

Issue Links

is duplicated by

FLINK-6189 Do not use yarn client config to do sanity check

Closed

links to

GitHub Pull Request #6775

Activity

People

Assignee:: Unassigned

Reporter:: Shannon Carey

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 17/Jan/17 22:35

Updated:: 19/Oct/18 08:16

Resolved:: 09/Oct/18 19:37