[YARN-6728] Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.7.1
Fix Version/s: None
Component/s: nodemanager, yarn
Labels:
None
Environment:

CentOS 7.1 hadoop-2.7.1

Target Version/s:

2.7.5

Description

In our cluster, I found many map keep "NEW" state for several minutes. Here I got the container log:

[2017-06-13T18:21:23.068+08:00] [INFO] containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 304) [AsyncDispatcher event handler] : Adding container_1495632926847_2459604_01_000011 to application application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) [AsyncDispatcher event handler] : Container container_1495632926847_2459604_01_000011 transitioned from NEW to LOCALIZING

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch of AsyncDispather run slow, because they visit the defaultFs. Our cluster increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation is enable. )

Container runs in nodemanager will invoke initApp(), then invoke verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit the defaultFs. So the container will be stuck here. Then application will run slow.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-6728.patch.00_branch-2.7
23/Jun/17 09:39
4 kB
Chenyu Zheng

Activity

People

Assignee:: Unassigned

Reporter:: Chenyu Zheng

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Jun/17 09:22

Updated:: 06/Aug/17 08:40

Time Tracking

Estimated:

Remaining:

Logged:

Not Specified