Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1533

Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.22.0
    • Component/s: jobtracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Incremental enhancements to the JobTracker to optimize heartbeat handling.

      Description

      When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT executes heartBeat() method heavily. This internally makes a call to CapacityTaskScheduler.updateQSIObjects().

      CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() for setting the job scheduling information. Based on the datastructure size of "jobQueuesManager" and "queueInfoMap", the number of times String.format() gets executed becomes very high. String.format() internally does pattern matching which turns to be out very heavy (This was revealed while profiling JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of which String.format() took 46%.

      Would it be possible to do String.format() only at the time of invoking JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while processing heartbeats.

      1. mapreduce-1533-v1.4.patch
        15 kB
        Amar Kamat
      2. ASF.LICENSE.NOT.GRANTED--MAPREDUCE-1533-and-others-20100413.1.txt
        30 kB
        Vinod Kumar Vavilapalli
      3. MAPREDUCE-1533-and-others-20100413.bugfix.txt
        2 kB
        Arun C Murthy
      4. mapreduce-1533-v1.8.patch
        27 kB
        Amar Kamat
      5. mapreduce-1533--2010-05-10a.patch
        36 kB
        Dick King
      6. mapreduce-1533--2010-05-21.patch
        40 kB
        Dick King
      7. mapreduce-1533--2010-05-21a.patch
        39 kB
        Dick King
      8. mapreduce-1533--2010-05-24.patch
        39 kB
        Dick King

        Issue Links

          Activity

          Hide
          Liyin Liang added a comment -

          The patch has a small problem.
          I think line257-259 of Counters.java should be:
          for(String subcounter: subcountersArray)

          { builder.append(subcounter); }

          instead of:
          for(Counter counter: subcounters.values())

          { builder.append(counter.makeEscapedCompactString()); }
          Show
          Liyin Liang added a comment - The patch has a small problem. I think line257-259 of Counters.java should be: for(String subcounter: subcountersArray) { builder.append(subcounter); } instead of: for(Counter counter: subcounters.values()) { builder.append(counter.makeEscapedCompactString()); }
          Hide
          Chris Douglas added a comment -

          +1

          I committed this. Thanks, Amar and Dick!

          In the future, this format for method signatures and calls is not consistent with the guidelines:

          +  static String getJobQueueSchedInfo
          +    (int numMapsRunningForThisJob, 
          
          Show
          Chris Douglas added a comment - +1 I committed this. Thanks, Amar and Dick! In the future, this format for method signatures and calls is not consistent with the guidelines : + static String getJobQueueSchedInfo + (int numMapsRunningForThisJob,
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12445363/mapreduce-1533--2010-05-24.patch
          against trunk revision 947758.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/204/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/204/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/204/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/204/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445363/mapreduce-1533--2010-05-24.patch against trunk revision 947758. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/204/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/204/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/204/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/204/console This message is automatically generated.
          Hide
          Dick King added a comment -

          slight rework of previous patch to fit changes to Trunk since I downloaded trunk to write the former

          Show
          Dick King added a comment - slight rework of previous patch to fit changes to Trunk since I downloaded trunk to write the former
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12445219/mapreduce-1533--2010-05-21.patch
          against trunk revision 947112.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/544/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445219/mapreduce-1533--2010-05-21.patch against trunk revision 947112. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/544/console This message is automatically generated.
          Hide
          Dick King added a comment -

          This new patch incorporates the minor items suggested in the previous comment.

          Show
          Dick King added a comment - This new patch incorporates the minor items suggested in the previous comment.
          Hide
          Amareshwari Sriramadasu added a comment -

          In Counters.java:
          + builder.append(escape(getName()));
          + builder.append(escape(getDisplayName()));
          The above changes should use the already escaped Strings escapedName, escapedDispName, as in earlier patch.

          Minor: Indentation for the if check in JobTracker.java needs a space after 'if'

          Show
          Amareshwari Sriramadasu added a comment - In Counters.java: + builder.append(escape(getName())); + builder.append(escape(getDisplayName())); The above changes should use the already escaped Strings escapedName, escapedDispName, as in earlier patch. Minor: Indentation for the if check in JobTracker.java needs a space after 'if'
          Hide
          Dick King added a comment -

          The two tests that fail, TestTaskTrackerLocalization and TestDelegationTokenRenewal , fail in trunk too.

          Show
          Dick King added a comment - The two tests that fail, TestTaskTrackerLocalization and TestDelegationTokenRenewal , fail in trunk too.
          Hide
          Dick King added a comment -

          I went ahead and added guards for the occurrences in JobInProgress .

          Low risk ... what's not to like?

          Show
          Dick King added a comment - I went ahead and added guards for the occurrences in JobInProgress . Low risk ... what's not to like?
          Hide
          Amareshwari Sriramadasu added a comment -

          The main aim of this jira was to fix few performance bottlenecks in the heartbeat processing. Maybe we can fix these bugs/improvements in a separate jira?

          Amar, the ones in JobInProgress are in heartbeat itself. Then, they should be fixed in this jira.

          Show
          Amareshwari Sriramadasu added a comment - The main aim of this jira was to fix few performance bottlenecks in the heartbeat processing. Maybe we can fix these bugs/improvements in a separate jira? Amar, the ones in JobInProgress are in heartbeat itself. Then, they should be fixed in this jira.
          Hide
          Amar Kamat added a comment -

          JobTracker.checkMemoryRequirements has a debug log without checking for isDebugEnabled(). Though it is not under JobTracker's heartbeat path, it is holding JobTracker lock for the debug log.

          found a couple of unprotected debug logs statements in JobInProgress also.

          The main aim of this jira was to fix few performance bottlenecks in the heartbeat processing. Maybe we can fix these bugs/improvements in a separate jira?

          Show
          Amar Kamat added a comment - JobTracker.checkMemoryRequirements has a debug log without checking for isDebugEnabled(). Though it is not under JobTracker's heartbeat path, it is holding JobTracker lock for the debug log. found a couple of unprotected debug logs statements in JobInProgress also. The main aim of this jira was to fix few performance bottlenecks in the heartbeat processing. Maybe we can fix these bugs/improvements in a separate jira?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I found a couple of unprotected debug logs statements in JobInProgress also.

          Show
          Vinod Kumar Vavilapalli added a comment - I found a couple of unprotected debug logs statements in JobInProgress also.
          Hide
          Amareshwari Sriramadasu added a comment -

          One minor comment on the patch:
          JobTracker.checkMemoryRequirements has a debug log without checking for isDebugEnabled(). Though it is not under JobTracker's heartbeat path, it is holding JobTracker lock for the debug log.

          Other changes look good.

          Show
          Amareshwari Sriramadasu added a comment - One minor comment on the patch: JobTracker.checkMemoryRequirements has a debug log without checking for isDebugEnabled(). Though it is not under JobTracker's heartbeat path, it is holding JobTracker lock for the debug log. Other changes look good.
          Hide
          Amar Kamat added a comment -

          Attaching a patch for trunk that does the following

          • Replaces String.format() with StringBuilder in CapacityScheduler
          • Guards debug statements with LOG.isDebugEnabled()
          • Optimizations Counters.makeEscapedCompactString()

          test-patch and ant tests passed.

          Show
          Amar Kamat added a comment - Attaching a patch for trunk that does the following Replaces String.format() with StringBuilder in CapacityScheduler Guards debug statements with LOG.isDebugEnabled() Optimizations Counters.makeEscapedCompactString() test-patch and ant tests passed.
          Hide
          Rajesh Balamohan added a comment -

          I am on vacation from 23-Apr - 17-May. I do not have internet access during this time. Please check with my manager sriguru@ for any urgent issues.

          ~Rajesh.B

          Show
          Rajesh Balamohan added a comment - I am on vacation from 23-Apr - 17-May. I do not have internet access during this time. Please check with my manager sriguru@ for any urgent issues. ~Rajesh.B
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Modifying title to include the fix for String.format() in Counters also.

          Show
          Vinod Kumar Vavilapalli added a comment - Modifying title to include the fix for String.format() in Counters also.
          Hide
          Arun C Murthy added a comment -

          Missed to clarify that I'm uploading on Vinod's behalf. Thanks Vinod!

          Show
          Arun C Murthy added a comment - Missed to clarify that I'm uploading on Vinod's behalf. Thanks Vinod!
          Hide
          Arun C Murthy added a comment -

          Addendum to previous patch to fix a minor error in Counters' format.

          Show
          Arun C Murthy added a comment - Addendum to previous patch to fix a minor error in Counters' format.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I am not able to grant permissions for the patch to apache.

          I filed INFRA-2613.

          Show
          Vinod Kumar Vavilapalli added a comment - I am not able to grant permissions for the patch to apache. I filed INFRA-2613 .
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I am not able to grant permissions for the patch to apache.

          Something got mixed up when bringing back JIRA up after the recent security breach? Anyone who can grant permissions? Or is it a general problem warranting an INFRA ticket?

          Show
          Vinod Kumar Vavilapalli added a comment - I am not able to grant permissions for the patch to apache. Something got mixed up when bringing back JIRA up after the recent security breach? Anyone who can grant permissions? Or is it a general problem warranting an INFRA ticket?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Patch for Yahoo! distribution. Uploading it on behalf of Amar. Not for commit to any of the apache svn branches.

          Show
          Vinod Kumar Vavilapalli added a comment - Patch for Yahoo! distribution. Uploading it on behalf of Amar. Not for commit to any of the apache svn branches.
          Hide
          Hemanth Yamijala added a comment -

          As of now its a push model where every change in the scheduler's state results into an info string which gets pushed to all the jobs. Shouldn't it be a pull model wherein the jobs pull the data from the scheduler whenever required?

          Yes, this seems a sensible direction to me. Given that the view requests come very infrequently as compared to heartbeats, this is a more optimal design.

          But for now we can keep it simple and solve the problem at hand by using StringBuilder. Thoughts?

          I am fine with this proposal.

          Show
          Hemanth Yamijala added a comment - As of now its a push model where every change in the scheduler's state results into an info string which gets pushed to all the jobs. Shouldn't it be a pull model wherein the jobs pull the data from the scheduler whenever required? Yes, this seems a sensible direction to me. Given that the view requests come very infrequently as compared to heartbeats, this is a more optimal design. But for now we can keep it simple and solve the problem at hand by using StringBuilder. Thoughts? I am fine with this proposal.
          Hide
          Amar Kamat added a comment -

          Benchmark results comparing StringBuilder with String.format :
          1) StringBuilder took 1.261 secs for generating 1,000,000 strings
          2) String.format took 9.126 sec for generating 1,000,000 strings

          So assuming that there are 400 heartbeat calls made per sec, we have ~2.5 ms per heartbeat time. Assuming that there are not more than 100 jobs running at a given time, we have
          1) StringBuilder taking 0.1261 ms for generating 100 strings
          2) String.format taking 0.9126 ms for generating 100 strings

          Thus String.format takes 36% (i.e 0.9126/2.5) whereas StringBuilder takes 5% (i.e 0.1261/2.5) of the total heartbeat processing time.

          Show
          Amar Kamat added a comment - Benchmark results comparing StringBuilder with String.format : 1) StringBuilder took 1.261 secs for generating 1,000,000 strings 2) String.format took 9.126 sec for generating 1,000,000 strings So assuming that there are 400 heartbeat calls made per sec, we have ~2.5 ms per heartbeat time. Assuming that there are not more than 100 jobs running at a given time, we have 1) StringBuilder taking 0.1261 ms for generating 100 strings 2) String.format taking 0.9126 ms for generating 100 strings Thus String.format takes 36% (i.e 0.9126/2.5) whereas StringBuilder takes 5% (i.e 0.1261/2.5) of the total heartbeat processing time.
          Hide
          Amar Kamat added a comment -

          How about using StringBuilder instead of String.format? The problem lies in the way how scheduling info is managed. As of now its a push model where every change in the scheduler's state results into an info string which gets pushed to all the jobs. Shouldn't it be a pull model wherein the jobs pull the data from the scheduler whenever required? Roughly ~100 hearbeat calls are made in a sec and in every hearbeat, the scheduler's state can potentially change resulting into an info string being pushed. That is, most of the times the info gets over-written before getting consumed making the pull model a good fit for this case. But for now we can keep it simple and solve the problem at hand by using StringBuilder. Thoughts?

          Show
          Amar Kamat added a comment - How about using StringBuilder instead of String.format ? The problem lies in the way how scheduling info is managed. As of now its a push model where every change in the scheduler's state results into an info string which gets pushed to all the jobs. Shouldn't it be a pull model wherein the jobs pull the data from the scheduler whenever required? Roughly ~100 hearbeat calls are made in a sec and in every hearbeat, the scheduler's state can potentially change resulting into an info string being pushed. That is, most of the times the info gets over-written before getting consumed making the pull model a good fit for this case. But for now we can keep it simple and solve the problem at hand by using StringBuilder. Thoughts?
          Hide
          Hemanth Yamijala added a comment -

          Few comments on the patch:

          • Move TaskSchedulingContext.JOB_SCHEDULING_INFO_FORMAT_STRING to JobSchedulingInfoHolder
          • QueueSchedulingContext does not seem to be the right place to define JobSchedulingInfoHolder. The container class manages data related to queues, and not a specific Job. Maybe it should be a separate class.
          • By making JobSchedulingInfoHolder.toString to be called outside heartbeats, we are addressing the core problem in the issue. But there still exist a few calls to this method from within a JobTracker lock - like CLI APIs like getAllJobs, etc., though they occur very infrequently as compared to heartbeats. With that context, should we implement a more optimized version of toString, maybe using StringBuilder (as was suggested elsewhere).
          • The changes in JobQueue.updateStatsOnRunningJob and TaskDataView.getSlotsOccupied can be avoided, but the intent of the change can still be met, by changing the algorithm in TaskDataView.getSlotsPerTask. I am giving this idea based on preliminary patches in MAPREDUCE-1354. There, we optimized getNumSlotsPer {Map|Reduce}

            to be unsynchronized, by making the corresponding variables volatile. Hence, getSlotsPerTask can now be implemented as:

            int getSlotsPerTask(JobInProgress job) {
              return job.getNumSlotsPerMap();
            }
            

            and likewise for reduces.
            This makes it fewer changes to the patch.

          • I would suggest a few documentation changes to better document the contract of the scheduling info.
            • Document that getSchedulingInfo returns a stringified representation of the job scheduling info set in setJobSchedulingInfo.
            • Document that getJobSchedulingInfo will return the stringified representation of the job scheduling info on the Client, but the actual object on the server. (Note that we are deserializing the stringified representation of the scheduling info on the client, not the actual object itself)
            • Document the intent of setJobSchedulingInfo - i.e. it is for optimization of heartbeats and allows lazy construction of the stringified representation on a need basis. This is an important design choice to capture, I think.
          • Do we need mapred.JobStatus. {get|set}

            JobSchedulingInfo ? Can we not define them only in mapreduce.JobStatus ?

          • I don't think we need the cast in JobStatus.readFields casting the scheduling info string to an Object, because this is allowed anyway.
          • Given we are going to store the stringified representation of the scheduling info on the client, should we retain the name of the JobStatus variable as schedulingInfo only ?
          Show
          Hemanth Yamijala added a comment - Few comments on the patch: Move TaskSchedulingContext.JOB_SCHEDULING_INFO_FORMAT_STRING to JobSchedulingInfoHolder QueueSchedulingContext does not seem to be the right place to define JobSchedulingInfoHolder. The container class manages data related to queues, and not a specific Job. Maybe it should be a separate class. By making JobSchedulingInfoHolder.toString to be called outside heartbeats, we are addressing the core problem in the issue. But there still exist a few calls to this method from within a JobTracker lock - like CLI APIs like getAllJobs, etc., though they occur very infrequently as compared to heartbeats. With that context, should we implement a more optimized version of toString, maybe using StringBuilder (as was suggested elsewhere). The changes in JobQueue.updateStatsOnRunningJob and TaskDataView.getSlotsOccupied can be avoided, but the intent of the change can still be met, by changing the algorithm in TaskDataView.getSlotsPerTask. I am giving this idea based on preliminary patches in MAPREDUCE-1354 . There, we optimized getNumSlotsPer {Map|Reduce} to be unsynchronized, by making the corresponding variables volatile. Hence, getSlotsPerTask can now be implemented as: int getSlotsPerTask(JobInProgress job) { return job.getNumSlotsPerMap(); } and likewise for reduces. This makes it fewer changes to the patch. I would suggest a few documentation changes to better document the contract of the scheduling info. Document that getSchedulingInfo returns a stringified representation of the job scheduling info set in setJobSchedulingInfo. Document that getJobSchedulingInfo will return the stringified representation of the job scheduling info on the Client, but the actual object on the server. (Note that we are deserializing the stringified representation of the scheduling info on the client, not the actual object itself) Document the intent of setJobSchedulingInfo - i.e. it is for optimization of heartbeats and allows lazy construction of the stringified representation on a need basis. This is an important design choice to capture, I think. Do we need mapred.JobStatus. {get|set} JobSchedulingInfo ? Can we not define them only in mapreduce.JobStatus ? I don't think we need the cast in JobStatus.readFields casting the scheduling info string to an Object, because this is allowed anyway. Given we are going to store the stringified representation of the scheduling info on the client, should we retain the name of the JobStatus variable as schedulingInfo only ?
          Hide
          Amar Kamat added a comment -

          Attaching a patch for review. Instead of doing an in-line String.format(), a wrapper object is passed with all the necessary values to construct the scheduling-info string. The scheduling info is created lazily when toString() method is invoked on the wrapper object.

          Show
          Amar Kamat added a comment - Attaching a patch for review. Instead of doing an in-line String.format(), a wrapper object is passed with all the necessary values to construct the scheduling-info string. The scheduling info is created lazily when toString() method is invoked on the wrapper object.
          Hide
          Rajesh Balamohan added a comment -

          For the benchmark I ran with short jobs, there were 692 items in jobQueuesManager.getRunningJobQueue(qsi.queueName).

          This essentially means that, for every heartbeat processing String.format() has to be invoked 692 times. This might be the reason why it turned to be expensive.

          Show
          Rajesh Balamohan added a comment - For the benchmark I ran with short jobs, there were 692 items in jobQueuesManager.getRunningJobQueue(qsi.queueName). This essentially means that, for every heartbeat processing String.format() has to be invoked 692 times. This might be the reason why it turned to be expensive.

            People

            • Assignee:
              Dick King
              Reporter:
              Rajesh Balamohan
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development