[MAPREDUCE-5928] Deadlock allocating containers for mappers and reducers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None
Environment:

Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)

Description

I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers).
Due to the small memory of these systems I configured yarn as follows:

yarn.nodemanager.resource.memory-mb = 2200
yarn.scheduler.minimum-allocation-mb = 250

On my client I did

mapreduce.map.memory.mb = 512
mapreduce.reduce.memory.mb = 512

Now I run a job with 27 mappers and 32 reducers.
After a while I saw this deadlock occur:

All nodes had been filled to their maximum capacity with reducers.
1 Mapper was waiting for a container slot to start in.

I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container).

Workaround:
I set this value from my job. The default value is 0.05 (= 5%)

mapreduce.job.reduce.slowstart.completedmaps = 0.99f

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MR job stuck in deadlock.png.jpg
16/Jun/14 13:43
62 kB
Niels Basjes
Cluster fully loaded.png.jpg
16/Jun/14 13:45
141 kB
Niels Basjes
AM-MR-syslog - Cleaned.txt.gz
16/Jun/14 15:01
420 kB
Niels Basjes

Issue Links

duplicates

YARN-1680 availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

Open

Activity

People

Assignee:: Unassigned

Reporter:: Niels Basjes

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 16/Jun/14 13:41

Updated:: 18/Jun/14 15:34

Resolved:: 18/Jun/14 15:34