[TEZ-3479] DAG AM does not schedule any more containers in corner cases - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.7.1
Fix Version/s: None
Component/s: None
Labels:
None

Description

Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7.

Some workloads end up generating lots of data that the tasks start throwing "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after enough number of retries which happens most of the time. Once in a while (~ once in 20-30 runs), DAG AM gets into hung state and does not schedule any more containers for the failed task attempts. Will attach the logs shortly.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

application_1476667862449_0031_not_complete.1.log.tar.gz
18/Oct/16 23:03
4.76 MB
Rajesh Balamohan

Activity

People

Assignee:: Unassigned

Reporter:: Rajesh Balamohan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Oct/16 22:59

Updated:: 27/Oct/16 18:25