[MAPREDUCE-5485] Allow repeating job commit by extending OutputCommitter API - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.1.0-beta
Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
Component/s: None
Labels:
None

Target Version/s:

2.7.3
Hadoop Flags:

Reviewed
Release Note:

Hide
Previously, the MR job will get failed if AM get restarted for some reason (like node failure, etc.) during its doing commit job no matter if AM attempts reach to the maximum attempts.
In this improvement, we add a new API isCommitJobRepeatable() to OutputCommitter interface which to indicate if job's committer can do commitJob again if previous commit work is interrupted by NM/AM failures, etc. The instance of OutputCommitter, which support repeatable job commit (like FileOutputCommitter in algorithm 2), can allow AM to continue the commitJob() after AM restart as a new attempt.

Show
Previously, the MR job will get failed if AM get restarted for some reason (like node failure, etc.) during its doing commit job no matter if AM attempts reach to the maximum attempts. In this improvement, we add a new API isCommitJobRepeatable() to OutputCommitter interface which to indicate if job's committer can do commitJob again if previous commit work is interrupted by NM/AM failures, etc. The instance of OutputCommitter, which support repeatable job commit (like FileOutputCommitter in algorithm 2), can allow AM to continue the commitJob() after AM restart as a new attempt.

Description

There are chances MRAppMaster crush during job committing,or NodeManager restart cause the committing AM exit due to container expire.In these cases ,the job will fail.
However,some jobs can redo commit so failing the job becomes unnecessary.
Let clients tell AM to allow redo commit or not is a better choice.
This idea comes from Jason Lowe's comments in ~~MAPREDUCE-4819~~

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-5485-v5-branch-2.7.patch
16/Nov/15 22:41
33 kB
Junping Du
MAPREDUCE-5485-v5.patch
11/Nov/15 15:27
33 kB
Junping Du
MAPREDUCE-5485-v4.patch
10/Nov/15 17:49
32 kB
Junping Du
MAPREDUCE-5485-v4.1.patch
11/Nov/15 11:00
32 kB
Junping Du
MAPREDUCE-5485-v3.patch
09/Nov/15 23:34
24 kB
Junping Du
MAPREDUCE-5485-v3.1.patch
10/Nov/15 01:30
24 kB
Junping Du
MAPREDUCE-5485-v2.patch
09/Nov/15 17:14
33 kB
Junping Du
MAPREDUCE-5485-v1.patch
04/Nov/15 17:22
29 kB
Junping Du
MAPREDUCE-5485-demo-2.patch
29/Oct/15 18:50
18 kB
Junping Du
MAPREDUCE-5485-demo.patch
22/Oct/15 16:53
14 kB
Junping Du

Issue Links

breaks

MAPREDUCE-6555 TestMRAppMaster fails on trunk

Resolved

MAPREDUCE-6595 Fix findbugs warnings in OutputCommitter and FileOutputCommitter

Resolved

is depended upon by

MAPREDUCE-6608 Work Preserving AM Restart for MapReduce

Open

is duplicated by

MAPREDUCE-6437 Add retry on some connection exception on job commit phase

Resolved

is related to

MAPREDUCE-6545 Test committer.commitJob() behavior during committing when MR AM get failed.

Open

relates to

MAPREDUCE-6478 Add an option to skip cleanupJob stage or ignore cleanup failure during commitJob().

Resolved

MAPREDUCE-4815 Speed up FileOutputCommitter#commitJob for many output files

Closed

(2 relates to)

Activity

People

Assignee:: Junping Du

Reporter:: Nemon Lou

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 28/Aug/13 13:07

Updated:: 25/Oct/19 20:27

Resolved:: 17/Nov/15 01:14