Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7309

Improve performance of reading resource request for mapper/reducers from config

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0, 3.1.0, 3.2.0, 3.3.0
    • 3.2.2, 3.4.0, 3.3.1
    • applicationmaster
    • None
    • Reviewed

    Description

      This is an issue could affect all the releases which includes YARN-6927

      Basically, we use regex match repeatedly when we read mapper/reducer resource request from config files. When we have large config file, and large number of splits, it could take a long time.  

      We saw AM could take hours to parse config when we have 200k+ splits, with a large config file (hundreds of kbs). 

      The problematic part is this:

        private void populateResourceCapability(TaskType taskType) {
          String resourceTypePrefix =
              getResourceTypePrefix(taskType);
          boolean memorySet = false;
          boolean cpuVcoresSet = false;
      
          if (resourceTypePrefix != null) {
            List<ResourceInformation> resourceRequests =
                ResourceUtils.getRequestedResourcesFromConfig(conf,
                    resourceTypePrefix);
      

      Inside ResourceUtils.getRequestedResourcesFromConfig(), we call Configuration.getValByRegex() which goes through all property keys that come from the MapReduce job configuration (jobconf.xml). If the job config is large (eg. due to being part of an MR pipeline and it was populated by an earlier job), then this results in running a regexp match unnecessarily for all properties over and over again. This is not necessary, because all mappers and reducers will have the same config, respectively.

      We should do proper caching for pre-configured resource requests.

      Attachments

        1. MAPREDUCE-7309-branch-3.3-001.patch
          4 kB
          Peter Bacsko
        2. MAPREDUCE-7309-branch-3.2-001.patch
          4 kB
          Peter Bacsko
        3. MAPREDUCE-7309-branch-3.1-001.patch
          4 kB
          Peter Bacsko
        4. MAPREDUCE-7309-005.patch
          4 kB
          Peter Bacsko
        5. MAPREDUCE-7309-004.patch
          4 kB
          Peter Bacsko
        6. MAPREDUCE-7309-003.patch
          4 kB
          Peter Bacsko
        7. MAPREDUCE-7309.002.patch
          8 kB
          Wangda Tan
        8. MAPREDUCE-7309.001.patch
          7 kB
          Wangda Tan

        Activity

          People

            pbacsko Peter Bacsko
            wangda Wangda Tan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: