[YARN-10427] Duplicate Job IDs in SLS output - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0, 3.3.0, 3.2.1, 3.4.0
Fix Version/s: 3.4.0
Component/s: scheduler-load-simulator
Labels:
- pull-request-available
Environment:

I ran the attached inputs on my MacBook Pro, using Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also tested against 3.2.1 and 3.3.0 release branches.

Target Version/s:

3.4.0
Hadoop Flags:

Reviewed

Description

Hello, I'm hoping someone can help me resolve or understand some issues I've been having with the YARN Scheduler Load Simulator (SLS). I've been experimenting with SLS for several months now at work as we're trying to build a simulation model to characterize our enterprise Hadoop infrastructure for purposes of future capacity planning. In the process of attempting to verify and validate the SLS output, I've encountered a number of issues including runtime exceptions and bad output. The focus of this issue is the bad output. In all my simulation runs, the jobruntime.csv output seems to have one or more of the following problems: no output, duplicate job ids, and/or missing job ids.

Because of where I work, I'm unable to provide the exact inputs I typically use, but I'm able to reproduce the problem of the duplicate Job IDS using some simplified inputs and configuration files, which I've attached, along with the output I obtained.

The command I used to run the simulation:

./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json --output-dir=sls-run-1 --print-simulation --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10

Can anyone help me understand what would cause the duplicate Job IDs in the output? Is this a bug in Hadoop or a problem with my inputs? Thanks in advance.

PS: This is my first issue I've ever opened so please be kind if I've missed something or am not understanding something obvious about the way Hadoop works. I'll gladly follow-up with more info as requested.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

fair-scheduler.xml
04/Sep/20 16:07
2 kB
Drew Merrill
inputsls.json
04/Sep/20 16:09
6 kB
Drew Merrill
jobruntime.csv
24/Nov/20 04:51
1 kB
Drew Merrill
jobruntime.csv
04/Sep/20 16:09
1 kB
Drew Merrill
mapred-site.xml
04/Sep/20 16:07
0.9 kB
Drew Merrill
sls-runner.xml
04/Sep/20 16:08
2 kB
Drew Merrill
YARN-10427.001.patch
22/Dec/20 15:54
16 kB
Szilard Nemeth
YARN-10427.002.patch
22/Dec/20 16:25
2 kB
Szilard Nemeth
YARN-10427.003.patch
22/Dec/20 16:35
1 kB
Szilard Nemeth
YARN-10427.004.patch
16/Dec/21 11:04
2 kB
Szilard Nemeth
YARN-10427-sls-scriptsandlogs.tar.gz
22/Dec/20 15:54
4.49 MB
Szilard Nemeth
yarn-site.xml
04/Sep/20 16:08
3 kB
Drew Merrill

Issue Links

links to

GitHub Pull Request #3809

Activity

People

Assignee:: Szilard Nemeth

Reporter:: Drew Merrill

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 04/Sep/20 16:16

Updated:: 16/Dec/21 23:37

Resolved:: 16/Dec/21 23:37

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

40m