[TEZ-4110] Make Tez fail fast when DFS quota is exceeded - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.0, 0.8.4, 0.9.2
Fix Version/s: 0.10.3
Component/s: None
Labels:
None
Environment:

hadoop 2.9, hive 2.3, tez

Description

This ticket aims at creating a similar feature as ~~MAPREDUCE-7148~~ in tez.

Make a tez job fail fast when dfs quota limitation is reached.

The background is : We are running hive jobs with a DFS quota limitation per job(3TB). If a job hits DFS quota limitation, the task that hit it will fail and there will be a few task reties before the job actually fails. The retry is not very helpful because the job will always fail anyway. In some worse cases, we have a job which has a single reduce task writing more than 3TB to HDFS over 20 hours, the reduce task exceeds the quota limitation and retries 4 times until the job fails in the end thus consuming a lot of unnecessary resource. This ticket aims at providing the feature to let a job fail fast when it writes too much data to the DFS and exceeds the DFS quota limitation.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

With-Patch-Output.rtf
14/Oct/23 22:06
48 kB
Ayush Saxena
Without-Patch-Output.rtf
14/Oct/23 22:06
178 kB
Ayush Saxena

Issue Links

is related to

MAPREDUCE-7148 Fast fail jobs when exceeds dfs quota limitation

Resolved

HADOOP-16777 Add Tez to LimitedPrivate of ClusterStorageCapacityExceededException

Resolved

links to

GitHub Pull Request #313

Activity

People

Assignee:: Ayush Saxena

Reporter:: Wang Yan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/Dec/19 03:59

Updated:: 16/Oct/23 08:09

Resolved:: 16/Oct/23 07:50

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 40m