Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0, 3.1.0, 3.2.0, 3.3.0
-
None
-
Reviewed
Description
This is an issue could affect all the releases which includes YARN-6927.
Basically, we use regex match repeatedly when we read mapper/reducer resource request from config files. When we have large config file, and large number of splits, it could take a long time.
We saw AM could take hours to parse config when we have 200k+ splits, with a large config file (hundreds of kbs).
The problematic part is this:
private void populateResourceCapability(TaskType taskType) { String resourceTypePrefix = getResourceTypePrefix(taskType); boolean memorySet = false; boolean cpuVcoresSet = false; if (resourceTypePrefix != null) { List<ResourceInformation> resourceRequests = ResourceUtils.getRequestedResourcesFromConfig(conf, resourceTypePrefix);
Inside ResourceUtils.getRequestedResourcesFromConfig(), we call Configuration.getValByRegex() which goes through all property keys that come from the MapReduce job configuration (jobconf.xml). If the job config is large (eg. due to being part of an MR pipeline and it was populated by an earlier job), then this results in running a regexp match unnecessarily for all properties over and over again. This is not necessary, because all mappers and reducers will have the same config, respectively.
We should do proper caching for pre-configured resource requests.