[MAPREDUCE-7309] Improve performance of reading resource request for mapper/reducers from config - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0, 3.1.0, 3.2.0, 3.3.0
Fix Version/s: 3.2.2, 3.4.0, 3.3.1
Component/s: applicationmaster
Labels:
None

Hadoop Flags:

Reviewed

Description

This is an issue could affect all the releases which includes ~~YARN-6927~~.

Basically, we use regex match repeatedly when we read mapper/reducer resource request from config files. When we have large config file, and large number of splits, it could take a long time.

We saw AM could take hours to parse config when we have 200k+ splits, with a large config file (hundreds of kbs).

The problematic part is this:

  private void populateResourceCapability(TaskType taskType) {
    String resourceTypePrefix =
        getResourceTypePrefix(taskType);
    boolean memorySet = false;
    boolean cpuVcoresSet = false;

    if (resourceTypePrefix != null) {
      List<ResourceInformation> resourceRequests =
          ResourceUtils.getRequestedResourcesFromConfig(conf,
              resourceTypePrefix);

Inside ResourceUtils.getRequestedResourcesFromConfig(), we call Configuration.getValByRegex() which goes through all property keys that come from the MapReduce job configuration (jobconf.xml). If the job config is large (eg. due to being part of an MR pipeline and it was populated by an earlier job), then this results in running a regexp match unnecessarily for all properties over and over again. This is not necessary, because all mappers and reducers will have the same config, respectively.

We should do proper caching for pre-configured resource requests.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-7309.001.patch
20/Nov/20 19:57
7 kB
Wangda Tan
MAPREDUCE-7309.002.patch
20/Nov/20 23:00
8 kB
Wangda Tan
MAPREDUCE-7309-003.patch
22/Nov/20 22:01
4 kB
Peter Bacsko
MAPREDUCE-7309-004.patch
24/Nov/20 09:27
4 kB
Peter Bacsko
MAPREDUCE-7309-005.patch
24/Nov/20 13:15
4 kB
Peter Bacsko
MAPREDUCE-7309-branch-3.1-001.patch
24/Nov/20 16:17
4 kB
Peter Bacsko
MAPREDUCE-7309-branch-3.2-001.patch
24/Nov/20 20:48
4 kB
Peter Bacsko
MAPREDUCE-7309-branch-3.3-001.patch
24/Nov/20 22:38
4 kB
Peter Bacsko

Activity

People

Assignee:: Peter Bacsko

Reporter:: Wangda Tan

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 20/Nov/20 19:57

Updated:: 10/Jun/21 08:10

Resolved:: 25/Nov/20 10:48