[YARN-4602] Simple and Scalable Message Service for YARN application - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: applications, resourcemanager
Labels:
None

Description

We are proposing to support MR AM restart with work preserving in MAPREDUCE-6608 (https://issues.apache.org/jira/browse/MAPREDUCE-6608) that when AM get failed for some reason, the inflight tasks will keep running/pending until new AM attempt comes back to continue. One of prerequisite is tasks should know where the new AM attempt get launched so TaskUmbilicalProtocol can get retry between clients and new server.
There could be the same requirement for other applications running on YARN too. Some application decide to handle message delivery itself, e.g. Long running services can leverage Slider agent to notify messages back and forth. However, vanilla applications on YARN is hard to achieve this because Hadoop RPC mechanism essentially is a single way of communication. Although two directions mechanism like heartbeats (between NM-RM or AM-RM) can get built on top of it, it make less sense to build the same mechanism between AM and its application containers - or it need to handle massive of client connections in AM which could be the new bottleneck for scalability and very complicated in state maintaining. Instead, we need a new message mechanism that is simple and scalable.

Attachments

Issue Links

is related to

MAPREDUCE-6608 Work Preserving AM Restart for MapReduce

Open

YARN-4758 Enable discovery of AMs by containers

Open

Activity

People

Assignee:: Junping Du

Reporter:: Junping Du

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 18/Jan/16 16:01

Updated:: 25/Oct/19 20:25