[MAPREDUCE-6858] HistoryFileManager thrashing due to high volume jobs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: jobhistoryserver
Labels:
None

Description

JHS log shows that it tried to move the same *.jhist twice, and the second moving causes FileNotFoundException's.

JHS scans "done_intermediate" dir for files to process and adds them to a thread pool
Thread pool starts processing these files to move them to "done" dir
JHS scans "done_intermediate" again for files to process and adds them to a thread pool
- If we have enough jobs where the thread pool can't keep up with the scanning interval, they'll get added twice (or more). If this keeps compounding, jobs end up would pile up and not getting processed for quite some time and getting lots of FileNotFoundException's.

By default, it looks like the thread pool only has 3 threads in it (mapreduce.jobhistory.move.thread-count). And the scan interval is 3 minutes (mapreduce.jobhistory.move.interval-ms). Perhaps we should increase these?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Yufei Gu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/Mar/17 19:12

Updated:: 08/Mar/17 19:40