[MAPREDUCE-3186] User jobs are getting hanged if the Resource manager process goes down and comes up while job is getting executed. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.23.0
Fix Version/s: 0.23.0
Component/s: mrv2
Labels:
- test
Environment:

linux

Target Version/s:

0.23.0
Hadoop Flags:

Reviewed
Release Note:

Hide
New Yarn configuration property:

Name: yarn.app.mapreduce.am.scheduler.connection.retries
Description: Number of times AM should retry to contact RM if connection is lost.

Show
New Yarn configuration property: Name: yarn.app.mapreduce.am.scheduler.connection.retries Description: Number of times AM should retry to contact RM if connection is lost.

Description

If the resource manager is restarted while the job execution is in progress, the job is getting hanged.
UI shows the job as running.
In the RM log, it is throwing an error "ERROR org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemptId doesnt exist in cache appattempt_1318579738195_0004_000001"
In the console MRAppMaster and Runjar processes are not getting killed

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-3186.v1.txt
27/Oct/11 00:35
5 kB
Eric Payne
MAPREDUCE-3186.v2.txt
27/Oct/11 18:24
10 kB
Eric Payne
MR3186_v3.txt
28/Oct/11 00:09
12 kB
Siddharth Seth

Issue Links

is cloned by

MAPREDUCE-3286 Unit tests for MAPREDUCE-3186 - User jobs are getting hanged if the Resource manager process goes down and comes up while job is getting executed.

Resolved

Activity

People

Assignee:: Eric Payne

Reporter:: Ramgopal N

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 14/Oct/11 08:59

Updated:: 25/Jul/18 10:35

Resolved:: 28/Oct/11 01:42