Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7309

Improve performance of reading resource request for mapper/reducers from config

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 3.1.0, 3.2.0, 3.3.0
    • Fix Version/s: 3.2.2, 3.4.0, 3.1.5, 3.3.1
    • Component/s: applicationmaster
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This is an issue could affect all the releases which includes YARN-6927

      Basically, we use regex match repeatedly when we read mapper/reducer resource request from config files. When we have large config file, and large number of splits, it could take a long time.  

      We saw AM could take hours to parse config when we have 200k+ splits, with a large config file (hundreds of kbs). 

      The problematic part is this:

        private void populateResourceCapability(TaskType taskType) {
          String resourceTypePrefix =
              getResourceTypePrefix(taskType);
          boolean memorySet = false;
          boolean cpuVcoresSet = false;
      
          if (resourceTypePrefix != null) {
            List<ResourceInformation> resourceRequests =
                ResourceUtils.getRequestedResourcesFromConfig(conf,
                    resourceTypePrefix);
      

      Inside ResourceUtils.getRequestedResourcesFromConfig(), we call Configuration.getValByRegex() which goes through all property keys that come from the MapReduce job configuration (jobconf.xml). If the job config is large (eg. due to being part of an MR pipeline and it was populated by an earlier job), then this results in running a regexp match unnecessarily for all properties over and over again. This is not necessary, because all mappers and reducers will have the same config, respectively.

      We should do proper caching for pre-configured resource requests.

        Attachments

        1. MAPREDUCE-7309.001.patch
          7 kB
          Wangda Tan
        2. MAPREDUCE-7309.002.patch
          8 kB
          Wangda Tan
        3. MAPREDUCE-7309-003.patch
          4 kB
          Peter Bacsko
        4. MAPREDUCE-7309-004.patch
          4 kB
          Peter Bacsko
        5. MAPREDUCE-7309-005.patch
          4 kB
          Peter Bacsko
        6. MAPREDUCE-7309-branch-3.1-001.patch
          4 kB
          Peter Bacsko
        7. MAPREDUCE-7309-branch-3.2-001.patch
          4 kB
          Peter Bacsko
        8. MAPREDUCE-7309-branch-3.3-001.patch
          4 kB
          Peter Bacsko

          Activity

            People

            • Assignee:
              pbacsko Peter Bacsko
              Reporter:
              wangda Wangda Tan
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: