Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Introduced new configuration parameter mapred.max.tasks.per.job to specifie the maximum number of tasks per job.

      Show
      Introduced new configuration parameter mapred.max.tasks.per.job to specifie the maximum number of tasks per job.

      Description

      We have seen instances when a user submitted a job with many thousands of mappers. The JobTracker was running with 3GB heap, but it was still not enough to prevent memory trashing from Garbage collection; effectively the Job Tracker was not able to serve jobs and had to be restarted.

      One simple proposal would be to limit the maximum number of tasks per job. This can be a configurable parameter. Is there other things that eat huge globs of memory in job Tracker?

      1. maxSplits.patch
        2 kB
        dhruba borthakur
      2. maxSplits10.patch
        6 kB
        dhruba borthakur
      3. maxSplits2.patch
        2 kB
        dhruba borthakur
      4. maxSplits3.patch
        4 kB
        dhruba borthakur
      5. maxSplits4.patch
        4 kB
        dhruba borthakur
      6. maxSplits5.patch
        4 kB
        dhruba borthakur
      7. maxSplits6.patch
        5 kB
        dhruba borthakur
      8. maxSplits7.patch
        15 kB
        dhruba borthakur
      9. maxSplits8.patch
        7 kB
        dhruba borthakur
      10. maxSplits9.patch
        6 kB
        dhruba borthakur

        Issue Links

          Activity

          Hide
          Devaraj Das added a comment -

          Dhruba, which version of hadoop did you see this behavior with? In 18 there is a fix for one such memory hog - HADOOP-3670. There is a discussion there on using the profiler and tuning the GC parameters.
          Also, could you please minimize the number of completed jobs kept in memory per user. Specify a very low value for mapred.jobtracker.completeuserjobs.maximum (defaults to 100 jobs per user). HADOOP-3150 will further help to reduce the amount of memory consumed by the JT since it removes the task promotion queue.

          Could you please give some more details - like the number of tasks the job had, how many such jobs could you run before the JT started to exhibit the problem, etc?

          Show
          Devaraj Das added a comment - Dhruba, which version of hadoop did you see this behavior with? In 18 there is a fix for one such memory hog - HADOOP-3670 . There is a discussion there on using the profiler and tuning the GC parameters. Also, could you please minimize the number of completed jobs kept in memory per user. Specify a very low value for mapred.jobtracker.completeuserjobs.maximum (defaults to 100 jobs per user). HADOOP-3150 will further help to reduce the amount of memory consumed by the JT since it removes the task promotion queue. Could you please give some more details - like the number of tasks the job had, how many such jobs could you run before the JT started to exhibit the problem, etc?
          Hide
          dhruba borthakur added a comment -

          Hi devaraj, thanks for the info. The problem we saw was with 0.17.1. The number of completed jobs in memory has been reduced from the default of 100 to 20. When the problem occured there were about 400 total jobs in the JT (running + completed + failed). I do not know how many jobs were run by the JobTracker since it was last started. This particular job had 60,000 mappers. About half of these tasks had finished before the problem started being acute.

          I have the GC log enabled via -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:/var/hadoop/logs/jobtracker1.gc.log

          This log (as well as jconsole) showed that JT was busy running full GC.

          I agree that moving to later releases till help mitigate this problem to a certain extent, but as a system adminstrator, I would like to set the upper limit of number of tasks for a single job. This is not a cure-all but could serve as a guardpost to prevent non-conforming jobs form running. This should be completary to all the JIRAs you mentioned, isn't it? Do you see a downside in this approach?

          Show
          dhruba borthakur added a comment - Hi devaraj, thanks for the info. The problem we saw was with 0.17.1. The number of completed jobs in memory has been reduced from the default of 100 to 20. When the problem occured there were about 400 total jobs in the JT (running + completed + failed). I do not know how many jobs were run by the JobTracker since it was last started. This particular job had 60,000 mappers. About half of these tasks had finished before the problem started being acute. I have the GC log enabled via -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:/var/hadoop/logs/jobtracker1.gc.log This log (as well as jconsole) showed that JT was busy running full GC. I agree that moving to later releases till help mitigate this problem to a certain extent, but as a system adminstrator, I would like to set the upper limit of number of tasks for a single job. This is not a cure-all but could serve as a guardpost to prevent non-conforming jobs form running. This should be completary to all the JIRAs you mentioned, isn't it? Do you see a downside in this approach?
          Hide
          dhruba borthakur added a comment -

          Configure the maximum number of splits that a job can have. This helps in eliminating rouge jobs from permanently stalling the Job Tracker. This patch is for review purposes only.

          Show
          dhruba borthakur added a comment - Configure the maximum number of splits that a job can have. This helps in eliminating rouge jobs from permanently stalling the Job Tracker. This patch is for review purposes only.
          Hide
          Devaraj Das added a comment -

          Sorry Dhruba, I somehow missed the jira mail with your earlier comment.

          I think we should do this in the server (if at all we want to do it) since this should be a admin configured param. Also proper warning should be given as response to the user if the JT discards a job because of too many splits.. Maybe we should maintain a global count of the tasks across all jobs as well and if that is exceeded we don't accept any job till some job(s) complete..

          By the way, configuring the mapred.jobtracker.completeuserjobs.maximum to a lower value should help your case.

          Show
          Devaraj Das added a comment - Sorry Dhruba, I somehow missed the jira mail with your earlier comment. I think we should do this in the server (if at all we want to do it) since this should be a admin configured param. Also proper warning should be given as response to the user if the JT discards a job because of too many splits.. Maybe we should maintain a global count of the tasks across all jobs as well and if that is exceeded we don't accept any job till some job(s) complete.. By the way, configuring the mapred.jobtracker.completeuserjobs.maximum to a lower value should help your case.
          Hide
          dhruba borthakur added a comment -

          Hi Devaraj, Excellent ideas. I will work on your feedback and will upload a new patch soon. Thanks for giving me quick feedback!

          Show
          dhruba borthakur added a comment - Hi Devaraj, Excellent ideas. I will work on your feedback and will upload a new patch soon. Thanks for giving me quick feedback!
          Hide
          dhruba borthakur added a comment -

          This patch fails a job if the jobtracker detects that the number of splits for the job exceeds a configures upper limit. Please let me know of any gotchas that this patch might have.

          I like the idea of having an upper limit of all tasks for the entire jobtracker. But I have not yet implemented it. Do you think I should implement it as part of this patch?

          BTW, in our cluster, we have set mapred.jobtracker.completeuserjobs.maximum to 20 already.

          Show
          dhruba borthakur added a comment - This patch fails a job if the jobtracker detects that the number of splits for the job exceeds a configures upper limit. Please let me know of any gotchas that this patch might have. I like the idea of having an upper limit of all tasks for the entire jobtracker. But I have not yet implemented it. Do you think I should implement it as part of this patch? BTW, in our cluster, we have set mapred.jobtracker.completeuserjobs.maximum to 20 already.
          Hide
          Devaraj Das added a comment -

          I think you should handle the overall #tasks case as well.

          Show
          Devaraj Das added a comment - I think you should handle the overall #tasks case as well.
          Hide
          dhruba borthakur added a comment -

          There are two new configuration parameters

          1. mapred.max.tasks.per.jobtracker determines the maximum limit of the number of tasks that a jobtracker can keep in memory.

          2. mapred.max.tasks.per.job The maximum number of mappers/reducers that a task can have.

          By default, these limits are switched off.

          Show
          dhruba borthakur added a comment - There are two new configuration parameters 1. mapred.max.tasks.per.jobtracker determines the maximum limit of the number of tasks that a jobtracker can keep in memory. 2. mapred.max.tasks.per.job The maximum number of mappers/reducers that a task can have. By default, these limits are switched off.
          Hide
          Amar Kamat added a comment -

          Few comments,
          1) If the job fails on init(), JobTracker invokes JobInProgress.kill(). So ideally you should simply throw an exception if the limit is crossed
          2) The api totalNumTasks() is not used anywhere and can be removed.
          3) The count returned by totalNumTasks() will be erroneous as it also takes into consideration jobs which exceed the limit. numMapTasks should be changed only when the job passes these limit checks.
          4) You could also check the limit in the JobInProgress constructor where the numMapTasks/numReduceTasks is obtained from the jobConf. This might help us catch the violation early and return to user some useful message why the job was not accepted.
          5) Since reduce tasks also add to numTasks, i think (numMapTasks + numReduceTasks) is a better metric.

          Show
          Amar Kamat added a comment - Few comments, 1) If the job fails on init() , JobTracker invokes JobInProgress.kill() . So ideally you should simply throw an exception if the limit is crossed 2) The api totalNumTasks() is not used anywhere and can be removed. 3) The count returned by totalNumTasks() will be erroneous as it also takes into consideration jobs which exceed the limit. numMapTasks should be changed only when the job passes these limit checks. 4) You could also check the limit in the JobInProgress constructor where the numMapTasks/numReduceTasks is obtained from the jobConf. This might help us catch the violation early and return to user some useful message why the job was not accepted. 5) Since reduce tasks also add to numTasks, i think ( numMapTasks + numReduceTasks ) is a better metric.
          Hide
          dhruba borthakur added a comment -

          Hi Amar, thanks for your comments.

          >1. If the job fails on init(), JobTracker invokes JobInProgress.kill(). So ideally you should simply throw an exception if the limit is crossed

          Can you pl explain which potion of code you are referring to here?

          >2. The api totalNumTasks() is not used anywhere and can be removed.
          This API is used by JobInProgress.initTasks. This method computes the number of tasks that is needed by this job.

          Regarding 3 and 4 i agree with you that it is better if I can check these limits in the constructor of JobInProgress. But, the number of splits for this current jobis not yet available when the constructor is invoked. That's the reason I do these checks in initTasks. Does it make sense?

          regarding point 5, my latest patch has this fix.

          Show
          dhruba borthakur added a comment - Hi Amar, thanks for your comments. >1. If the job fails on init(), JobTracker invokes JobInProgress.kill(). So ideally you should simply throw an exception if the limit is crossed Can you pl explain which potion of code you are referring to here? >2. The api totalNumTasks() is not used anywhere and can be removed. This API is used by JobInProgress.initTasks. This method computes the number of tasks that is needed by this job. Regarding 3 and 4 i agree with you that it is better if I can check these limits in the constructor of JobInProgress. But, the number of splits for this current jobis not yet available when the constructor is invoked. That's the reason I do these checks in initTasks. Does it make sense? regarding point 5, my latest patch has this fix.
          Hide
          dhruba borthakur added a comment -

          Incorporated all of Amar's suggestions.

          Show
          dhruba borthakur added a comment - Incorporated all of Amar's suggestions.
          Hide
          Amar Kamat added a comment -

          Can you pl explain which potion of code you are referring to here?

          Look at how the job is inited. If the init fails then there is a cleanup process associated with it. So simply throwing an exception would work and there is no need to explicitly set the job state and finish-time.

          This API is used by JobInProgress.initTasks. This method computes the number of tasks that is needed by this job.

          Oops! I missed that. But its still flawed as I have mentioned in comment #3. Plz check.

          Regarding 3 and 4 i agree with you that it is better if I can check these limits in the constructor of JobInProgress. ....

          I just checked and it seems that the job client never overwrites the number of maps to be spawned. Since the num-maps passed by the user is just a hint to the jobclient while calculating the splits, this information is of no use to the jobtracker and hence the job-client can overwrite the num-maps parameter before uploading the job.xml on the dfs. With this the job that should fail will fail fast (i.e in the constructor itself) and the user will be informed as to why the job failed.

          Comment #3 just states that totalNumTasks() will also count tasks from non-running (i.e killed/completed/failed) jobs. So totalNumTasks() should only take RUNNING jobs into consideration which calculating total tasks.

          Show
          Amar Kamat added a comment - Can you pl explain which potion of code you are referring to here? Look at how the job is inited. If the init fails then there is a cleanup process associated with it. So simply throwing an exception would work and there is no need to explicitly set the job state and finish-time. This API is used by JobInProgress.initTasks. This method computes the number of tasks that is needed by this job. Oops! I missed that. But its still flawed as I have mentioned in comment #3. Plz check. Regarding 3 and 4 i agree with you that it is better if I can check these limits in the constructor of JobInProgress. .... I just checked and it seems that the job client never overwrites the number of maps to be spawned. Since the num-maps passed by the user is just a hint to the jobclient while calculating the splits, this information is of no use to the jobtracker and hence the job-client can overwrite the num-maps parameter before uploading the job.xml on the dfs. With this the job that should fail will fail fast (i.e in the constructor itself) and the user will be informed as to why the job failed. Comment #3 just states that totalNumTasks() will also count tasks from non-running (i.e killed/completed/failed) jobs. So totalNumTasks() should only take RUNNING jobs into consideration which calculating total tasks.
          Hide
          dhruba borthakur added a comment -

          Thanks Amar for ur comments. I purposely kept totalNumTasks() to count all tasks (failed, completed, etc) because I am interested in limiting memory usage on the JT. All these tasks will occupy some memory in the JT, isn't it?

          Show
          dhruba borthakur added a comment - Thanks Amar for ur comments. I purposely kept totalNumTasks() to count all tasks (failed, completed, etc) because I am interested in limiting memory usage on the JT. All these tasks will occupy some memory in the JT, isn't it?
          Hide
          Amar Kamat added a comment -

          All these tasks will occupy some memory in the JT, isn't it?

          But not the jobs that were failed because they exceeded the limit. Remember that you are setting the numMaps before throwing the exception and hence they will also counted. The problem with this is that any job submitted after the limit-exceeding job might not pass this test even when the job should. You could simply reset the value to 0 before throwing the exception.

          Show
          Amar Kamat added a comment - All these tasks will occupy some memory in the JT, isn't it? But not the jobs that were failed because they exceeded the limit. Remember that you are setting the numMaps before throwing the exception and hence they will also counted. The problem with this is that any job submitted after the limit-exceeding job might not pass this test even when the job should. You could simply reset the value to 0 before throwing the exception.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Why can't org.apache.hadoop.mapred.LimitTasksPerJobTaskScheduler along with the mapred.jobtracker.completeuserjobs.maximum limit suffice? I agree that we still need a global limit over all the tasks, but, may be, that can be built into a scheduler too.

          Show
          Vinod Kumar Vavilapalli added a comment - Why can't org.apache.hadoop.mapred.LimitTasksPerJobTaskScheduler along with the mapred.jobtracker.completeuserjobs.maximum limit suffice? I agree that we still need a global limit over all the tasks, but, may be, that can be built into a scheduler too.
          Hide
          dhruba borthakur added a comment -

          Thanks to Amar for your comments. I am attaching a new patch that looks at the allocated tasks for each job and matches that with the specified limit. Amar: can you pl review this latest patch? Thanks.

          @Vinod: the proposed mapred.max.tasks.per.jobtracker is used to limit the memory usage of the jobtracker. We need to count how many tasks (failed, completed, running, etc) are resident in memory. I believe that mapred.jobtracker.completeuserjobs.maximum does not satifsy that requirement. I do not know much about org.apache.hadoop.mapred.LimitTasksPerJobTaskScheduler, maybe Amar can comment on that,

          Show
          dhruba borthakur added a comment - Thanks to Amar for your comments. I am attaching a new patch that looks at the allocated tasks for each job and matches that with the specified limit. Amar: can you pl review this latest patch? Thanks. @Vinod: the proposed mapred.max.tasks.per.jobtracker is used to limit the memory usage of the jobtracker. We need to count how many tasks (failed, completed, running, etc) are resident in memory. I believe that mapred.jobtracker.completeuserjobs.maximum does not satifsy that requirement. I do not know much about org.apache.hadoop.mapred.LimitTasksPerJobTaskScheduler, maybe Amar can comment on that,
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Ok, guess I was wrong. LimitTasksPerJobTaskScheduler limits the tasks running per job in the cluster. It doesn't help preventing any jobtracker stall. But still, this looks something that should be handled by a scheduler to me - perhaps the scheduler should bail out while initializing a job with huge number of tasks.

          Show
          Vinod Kumar Vavilapalli added a comment - Ok, guess I was wrong. LimitTasksPerJobTaskScheduler limits the tasks running per job in the cluster. It doesn't help preventing any jobtracker stall. But still, this looks something that should be handled by a scheduler to me - perhaps the scheduler should bail out while initializing a job with huge number of tasks.
          Hide
          dhruba borthakur added a comment -

          I think the goal is to avoid making the JobTracker allocate memory for tasks for a job that causes the limit to exceed a certain quantity. if the limit-check is in the scheduler, all schedulers have to implement this check. Also, by the time the scheduler gets to look at the task, the JT might have already allocated some memory for some task-related status.

          I am assuming that it is better to check this in the JT rather than in the scheduler, but please let me know if my assumption seems invalid.

          Show
          dhruba borthakur added a comment - I think the goal is to avoid making the JobTracker allocate memory for tasks for a job that causes the limit to exceed a certain quantity. if the limit-check is in the scheduler, all schedulers have to implement this check. Also, by the time the scheduler gets to look at the task, the JT might have already allocated some memory for some task-related status. I am assuming that it is better to check this in the JT rather than in the scheduler, but please let me know if my assumption seems invalid.
          Hide
          Amar Kamat added a comment -

          But still, this looks something that should be handled by a scheduler

          Scheduler should be least concerned about jobtracker's memory. Its just a pluggable code that should decide on what goes next. Whether or not the jobtracker can support some stuff should better be decided by the jobtracker.

          Show
          Amar Kamat added a comment - But still, this looks something that should be handled by a scheduler Scheduler should be least concerned about jobtracker's memory. Its just a pluggable code that should decide on what goes next. Whether or not the jobtracker can support some stuff should better be decided by the jobtracker.
          Hide
          Amar Kamat added a comment -

          Few comments
          1) JobInProgress.totalAllocatedTasks() should also consider nonLocalMaps and nonLocalRunningMaps. Applications like random-writer use these structures.
          2) There is an extra '-' diff in JobTracker.java
          3) You might need to synchronize totalAllocatedTasks() api in both places. Consider a case where job1 is in init stage while job2 is newly submitted. Assume both cannot run in parallel on the jobtracker. Assume job1 is not yet seen the splits and job2, which is getting constructed, checks for totalAllocatedTasks(). In such a case job2 will succeed and will move for init. job1 will create its cache and continue while job2 will fail in init. Also there is no guarantee for inits to be sequential. They might happen in parallel as its upto the scheduler. So now all the jobs that call totalTasks might see some stale value and hence might end up expanding themselves.
          4) Plz add a test case.

          Show
          Amar Kamat added a comment - Few comments 1) JobInProgress.totalAllocatedTasks() should also consider nonLocalMaps and nonLocalRunningMaps . Applications like random-writer use these structures. 2) There is an extra '-' diff in JobTracker.java 3) You might need to synchronize totalAllocatedTasks() api in both places. Consider a case where job1 is in init stage while job2 is newly submitted. Assume both cannot run in parallel on the jobtracker. Assume job1 is not yet seen the splits and job2, which is getting constructed, checks for totalAllocatedTasks(). In such a case job2 will succeed and will move for init. job1 will create its cache and continue while job2 will fail in init. Also there is no guarantee for inits to be sequential. They might happen in parallel as its upto the scheduler. So now all the jobs that call totalTasks might see some stale value and hence might end up expanding themselves. 4) Plz add a test case.
          Hide
          dhruba borthakur added a comment -

          I have a question regarding item 1 above. The totalAllocatedTasks current use runningMapCache, noinRunningMapcache, runningReduces and nonRunningReduces.

          Over and above these four, should it also use nonLocalMaps and nonLocalRunning Maps? I thought thet nonLocalRunningMaps are already contained in runningMaoCache. Please let me know if this is not true.

          Show
          dhruba borthakur added a comment - I have a question regarding item 1 above. The totalAllocatedTasks current use runningMapCache, noinRunningMapcache, runningReduces and nonRunningReduces. Over and above these four, should it also use nonLocalMaps and nonLocalRunning Maps? I thought thet nonLocalRunningMaps are already contained in runningMaoCache. Please let me know if this is not true.
          Hide
          dhruba borthakur added a comment -

          I like the idea of making JobInProgress.totalAlloctedTasks be a synchronized method. But making the JobTracker.totalAlloctedTasks be a synchronized method might be bad because it violates locking hierarcy, right? I am assuming that one cannot acquire the jobTracker lock is one already has the JobInProgress lock. Do we really need to make JobTracker.totalAllocatedTasks be a synchronized method?

          Show
          dhruba borthakur added a comment - I like the idea of making JobInProgress.totalAlloctedTasks be a synchronized method. But making the JobTracker.totalAlloctedTasks be a synchronized method might be bad because it violates locking hierarcy, right? I am assuming that one cannot acquire the jobTracker lock is one already has the JobInProgress lock. Do we really need to make JobTracker.totalAllocatedTasks be a synchronized method?
          Hide
          Amar Kamat added a comment -

          I have a question regarding item 1 above.

          They are mutually exclusive. For data local tasks runningMapCache, noinRunningMapcache, runningReduces and nonRunningReduces are used. For non-data local tasks nonLocalMaps and nonLocalRunningMaps are used.

          it violates locking hierarchy,

          Yes. One thing you could do is keep a global count of all allocated tasks in the JobTracker. Jobs getting constructed will check the value before bailing out. Once the job inits, it updates the value of this count. Any access/update to the count should be guarded. Since the count will be updates only on passing the init tests, we can be sure that limit-exceeding jobs never get inited. So something like

          In init :
            1. Get the count lock
            2. Check if the count + self-tasks > limit
                2.1 If yes then throw an exception
                2.2 else update the count and release the lock
          

          Since the value of allocated tasks never change once inited, there is no point in iterating over the jobs everytime. Once the job is removed from the jobtracker (see RetireJobs), then update the count to reflect the change.

          Show
          Amar Kamat added a comment - I have a question regarding item 1 above. They are mutually exclusive. For data local tasks runningMapCache, noinRunningMapcache, runningReduces and nonRunningReduces are used. For non-data local tasks nonLocalMaps and nonLocalRunningMaps are used. it violates locking hierarchy, Yes. One thing you could do is keep a global count of all allocated tasks in the JobTracker. Jobs getting constructed will check the value before bailing out. Once the job inits, it updates the value of this count. Any access/update to the count should be guarded. Since the count will be updates only on passing the init tests, we can be sure that limit-exceeding jobs never get inited. So something like In init : 1. Get the count lock 2. Check if the count + self-tasks > limit 2.1 If yes then throw an exception 2.2 else update the count and release the lock Since the value of allocated tasks never change once inited, there is no point in iterating over the jobs everytime. Once the job is removed from the jobtracker (see RetireJobs), then update the count to reflect the change.
          Hide
          dhruba borthakur added a comment -

          Incorprated locking changes to protect the global counter of number of allocated tasks in job tracker. Also wrote a unit test.

          Show
          dhruba borthakur added a comment - Incorprated locking changes to protect the global counter of number of allocated tasks in job tracker. Also wrote a unit test.
          Hide
          dhruba borthakur added a comment -

          Submitting the patch to detect hadoopQA test results.

          Show
          dhruba borthakur added a comment - Submitting the patch to detect hadoopQA test results.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389601/maxSplits7.patch
          against trunk revision 692597.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3195/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3195/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3195/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3195/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389601/maxSplits7.patch against trunk revision 692597. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3195/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3195/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3195/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3195/console This message is automatically generated.
          Hide
          Amar Kamat added a comment -

          More comments
          1) Computing allocated tasks based on internal datastructures of JobInProgress is incorrect as their count will always be missing. garbageCollect() for the job frees up the memory taken up by the caches. Note that garabgeCollect() will be invoked before retireJob().

          I think we should use numMapTasks and numReduceTasks instead of the sizes of the caches. Also reset the values of numMapTasks/numReduceTasks if the limit is crossed. If someone introduces partial expansion of tasks then we can think about it later.

          2) I am not sure if changing the log level of one class is sufficient. It would be nice if we can set the log level of the complete framework in the testcase. But again is it required?

          3) Did not check the test case completely but should take care of the following cases

          • submit multiple jobs such that all of them should be accommodated. Both while the previous ones are running and also when the previous jobs are cleaned up.
          • submit some small jobs and then a large job that exceeds the limit. Submit a small job after a limit-crossing job and make sure that the job gets accepted. The reason is to test if the cleanup is done and there is no side effect of large job being submitted and getting rejected
          • submit two jobs back to back such that job1.totalTasks() + job2.totalTasks() > limit. Hence the first one should be accepted and the next one should be rejected.
          • anything else?

          4) I dont see in the test case where the mapred.jobtracker.completeuserjobs.maximum is set. By default it will be 100 and the test case might never test the cleanup process.

          Show
          Amar Kamat added a comment - More comments 1) Computing allocated tasks based on internal datastructures of JobInProgress is incorrect as their count will always be missing. garbageCollect() for the job frees up the memory taken up by the caches. Note that garabgeCollect() will be invoked before retireJob() . I think we should use numMapTasks and numReduceTasks instead of the sizes of the caches. Also reset the values of numMapTasks/numReduceTasks if the limit is crossed. If someone introduces partial expansion of tasks then we can think about it later. 2) I am not sure if changing the log level of one class is sufficient. It would be nice if we can set the log level of the complete framework in the testcase. But again is it required? 3) Did not check the test case completely but should take care of the following cases submit multiple jobs such that all of them should be accommodated. Both while the previous ones are running and also when the previous jobs are cleaned up. submit some small jobs and then a large job that exceeds the limit. Submit a small job after a limit-crossing job and make sure that the job gets accepted. The reason is to test if the cleanup is done and there is no side effect of large job being submitted and getting rejected submit two jobs back to back such that job1.totalTasks() + job2.totalTasks() > limit . Hence the first one should be accepted and the next one should be rejected. anything else? 4) I dont see in the test case where the mapred.jobtracker.completeuserjobs.maximum is set. By default it will be 100 and the test case might never test the cleanup process.
          Hide
          Owen O'Malley added a comment -

          I think the global limits are too complex to be worthwhile without a better model of the memory. I am ok with max maps and reduces. I would have preferred having an empty/undefined value mean unlimited.

          -1 except for the very simple per job limits.

          Show
          Owen O'Malley added a comment - I think the global limits are too complex to be worthwhile without a better model of the memory. I am ok with max maps and reduces. I would have preferred having an empty/undefined value mean unlimited. -1 except for the very simple per job limits.
          Hide
          Amar Kamat added a comment -

          Owen,
          I think Dhruba's concern here is of many small/avg sized jobs collectively overloading the jobtracker, see here. Capping individual jobs might not help as all the jobs will accumulate in JT's memory and bring it down. I think some kind of local capping, global capping and smart scheduling/initialization might help. But I agree that in the long term we need to model the memory better but for now simple heuristics might work.

          Show
          Amar Kamat added a comment - Owen, I think Dhruba's concern here is of many small/avg sized jobs collectively overloading the jobtracker, see here . Capping individual jobs might not help as all the jobs will accumulate in JT's memory and bring it down. I think some kind of local capping, global capping and smart scheduling/initialization might help. But I agree that in the long term we need to model the memory better but for now simple heuristics might work.
          Hide
          dhruba borthakur added a comment -

          I like Owen's idea of putting a limit on the max number of tasks for a job only. This is the first step in preventing bad things occuring in a cluster. I would like to defer implementing the global limits to later. Does it sound reasonable?

          Show
          dhruba borthakur added a comment - I like Owen's idea of putting a limit on the max number of tasks for a job only. This is the first step in preventing bad things occuring in a cluster. I would like to defer implementing the global limits to later. Does it sound reasonable?
          Hide
          dhruba borthakur added a comment -

          Hi Amar, do you agree with Owen's view (and mine too) that the global limits are too complex to be worthwhile? Also, it might not even be feasible for an admin to determine what the global limit should be. If you agree, then the attached maxSplits2.patch is the code we need.

          Show
          dhruba borthakur added a comment - Hi Amar, do you agree with Owen's view (and mine too) that the global limits are too complex to be worthwhile? Also, it might not even be feasible for an admin to determine what the global limit should be. If you agree, then the attached maxSplits2.patch is the code we need.
          Hide
          Devaraj Das added a comment -

          +1 on having the simple thing first (i think i had suggested the global limits earlier but hadn't realized that it would be so complicated).

          Show
          Devaraj Das added a comment - +1 on having the simple thing first (i think i had suggested the global limits earlier but hadn't realized that it would be so complicated).
          Hide
          dhruba borthakur added a comment -

          Fail a job if the number of tasks for a single job exceeds a configured maximum.

          Show
          dhruba borthakur added a comment - Fail a job if the number of tasks for a single job exceeds a configured maximum.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12390777/maxSplits8.patch
          against trunk revision 698385.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3356/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3356/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3356/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3356/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12390777/maxSplits8.patch against trunk revision 698385. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3356/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3356/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3356/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3356/console This message is automatically generated.
          Hide
          Amar Kamat added a comment -

          +1 for global limits.

          Show
          Amar Kamat added a comment - +1 for global limits.
          Hide
          Amar Kamat added a comment -

          +1 for global limits.

          I meant fixed limit per job.

          Show
          Amar Kamat added a comment - +1 for global limits. I meant fixed limit per job.
          Hide
          Amar Kamat added a comment -

          Few comments :

          Main :

          1. There is no need to do the cleanup in init tasks. Simply throwing an exception in init tasks should do the cleanup and kill/fail the job. There seems to be a bug in the framework and I have opened HADOOP-4261 to address it.
          2. mapred.max.tasks.per.job should be set at the jobtracker level and not at the job level. It should be something like an expert-final parameter

          Testcase :
          1. Remove the commented assert statement
          2. Most of the imports are not needed.
          3. Rewrite the test to reflect the changes

          Show
          Amar Kamat added a comment - Few comments : Main : 1. There is no need to do the cleanup in init tasks. Simply throwing an exception in init tasks should do the cleanup and kill/fail the job. There seems to be a bug in the framework and I have opened HADOOP-4261 to address it. 2. mapred.max.tasks.per.job should be set at the jobtracker level and not at the job level. It should be something like an expert-final parameter Testcase : 1. Remove the commented assert statement 2. Most of the imports are not needed. 3. Rewrite the test to reflect the changes
          Hide
          Arun C Murthy added a comment -

          Cancelling patch while Amar's comments are being accomodated...

          Minor nit: the variable maxSplits in JobInProgress should probably be renamed to 'maxTasks' - it's misleading.
          I'm not super excited about using 0 as the default value for mapred.max.tasks.per.job - this has come up before and I guess we need to come up with a way of specifying 'UNLIMITED' in our configuration files.

          Show
          Arun C Murthy added a comment - Cancelling patch while Amar's comments are being accomodated... Minor nit: the variable maxSplits in JobInProgress should probably be renamed to 'maxTasks' - it's misleading. I'm not super excited about using 0 as the default value for mapred.max.tasks.per.job - this has come up before and I guess we need to come up with a way of specifying 'UNLIMITED' in our configuration files.
          Hide
          Hemanth Yamijala added a comment -

          We use -1 in HADOOP-3759, Long.MAX_VALUE in HADOOP-657 and 0 or null at a few places. Somehow, it seems like a value that can by any argument never ever occur is a good choice to specify either 'UNLIMITED' or 'DONT CARE' or 'TURNED OFF' or whatever. -1 seems one such. smile

          Show
          Hemanth Yamijala added a comment - We use -1 in HADOOP-3759 , Long.MAX_VALUE in HADOOP-657 and 0 or null at a few places. Somehow, it seems like a value that can by any argument never ever occur is a good choice to specify either 'UNLIMITED' or 'DONT CARE' or 'TURNED OFF' or whatever. -1 seems one such. smile
          Hide
          dhruba borthakur added a comment -

          Incorporated all review comments except one.

          If the limit check fails, the JT raises an exception after marking the job as failed. Amar had commented that it is not required to mark the job status as "failed" before raising the exception. But I have seen that unless this status is set, a job that is expected to fail does not fail. the unit test will then fail.

          Show
          dhruba borthakur added a comment - Incorporated all review comments except one. If the limit check fails, the JT raises an exception after marking the job as failed. Amar had commented that it is not required to mark the job status as "failed" before raising the exception. But I have seen that unless this status is set, a job that is expected to fail does not fail. the unit test will then fail.
          Hide
          Amar Kamat added a comment -

          But I have seen that unless this status is set, a job that is expected to fail does not fail. the unit test will then fail.

          HADOOP-4261 will fix this.

          Show
          Amar Kamat added a comment - But I have seen that unless this status is set, a job that is expected to fail does not fail. the unit test will then fail. HADOOP-4261 will fix this.
          Hide
          dhruba borthakur added a comment -

          Hi amar, I marked this jira as being blocked by HADOOP-4261. Can you pl review this patch whenever u have some time?

          Show
          dhruba borthakur added a comment - Hi amar, I marked this jira as being blocked by HADOOP-4261 . Can you pl review this patch whenever u have some time?
          Hide
          Amar Kamat added a comment -

          Few comments
          1)

          int maxTasks = conf.getInt("mapred.jobtracker.maxtasks.per.job", -1);
          

          This will change per job as conf is configured per job. Hence all the jobs should get this value from the jobtracker. Something like

          -----JobTracker-------------
          int getMaxTasksPerJob() {
            return conf.get("mapred.jobtracker.maxtasks.per.job", -1);
          }
          -----JobInProgress---------
          int maxTasks = jobtracker.getMaxTasks();
          

          2)

          if (maxTasks != -1 && numMapTasks + numReduceTasks > maxTasks) {
          

          What is I pass -2? So the check should be

          if (maxTasks > 0 && numMapTasks + numReduceTasks > maxTasks) {
          

          3)

          if (maxTasks != -1 && numMapTasks + numReduceTasks > maxTasks) {
                long now = System.currentTimeMillis();
                this.finishTime = now;
                status.setRunState(JobStatus.FAILED);
                JobHistory.JobInfo.logFailed(profile.getJobID(), now, 0, 0);
                // Special case because the Job is not queued
                JobEndNotifier.registerNotification(this.getJobConf(), this.getStatus());
                throw new IOException(
                          "The number of tasks for this job " + 
                          (numMapTasks + numReduceTasks) +
                          " exceeds the configured limit " + maxTasks);
          }
          

          can be written as

          if (maxTasks > 0 && numMapTasks + numReduceTasks > maxTasks) {
                throw new IOException(
                          "The number of tasks for this job " + 
                          (numMapTasks + numReduceTasks) +
                          " exceeds the configured limit " + maxTasks);
           }
          

          Note that whatever you have done should be done by the job upon failure i.e

          • setting the runstate to FAILED
          • setting the finish time
          • logging failure to history and hence closing the file.

          4) One minor nit. mapred.jobtracker.maxtasks.per.job could be mapred.job.tasks.maximum

          The test case looks fine. Plz check with everyone if passing -1 for UNLIMITED is fine. Wondering what is 0 is passed? Should be bail out immediately or should we consider any non-positive number as UNLIMITED?

          Show
          Amar Kamat added a comment - Few comments 1) int maxTasks = conf.getInt( "mapred.jobtracker.maxtasks.per.job" , -1); This will change per job as conf is configured per job. Hence all the jobs should get this value from the jobtracker. Something like -----JobTracker------------- int getMaxTasksPerJob() { return conf.get( "mapred.jobtracker.maxtasks.per.job" , -1); } -----JobInProgress--------- int maxTasks = jobtracker.getMaxTasks(); 2) if (maxTasks != -1 && numMapTasks + numReduceTasks > maxTasks) { What is I pass -2 ? So the check should be if (maxTasks > 0 && numMapTasks + numReduceTasks > maxTasks) { 3) if (maxTasks != -1 && numMapTasks + numReduceTasks > maxTasks) { long now = System .currentTimeMillis(); this .finishTime = now; status.setRunState(JobStatus.FAILED); JobHistory.JobInfo.logFailed(profile.getJobID(), now, 0, 0); // Special case because the Job is not queued JobEndNotifier.registerNotification( this .getJobConf(), this .getStatus()); throw new IOException( "The number of tasks for this job " + (numMapTasks + numReduceTasks) + " exceeds the configured limit " + maxTasks); } can be written as if (maxTasks > 0 && numMapTasks + numReduceTasks > maxTasks) { throw new IOException( "The number of tasks for this job " + (numMapTasks + numReduceTasks) + " exceeds the configured limit " + maxTasks); } Note that whatever you have done should be done by the job upon failure i.e setting the runstate to FAILED setting the finish time logging failure to history and hence closing the file. 4) One minor nit. mapred.jobtracker.maxtasks.per.job could be mapred.job.tasks.maximum The test case looks fine. Plz check with everyone if passing -1 for UNLIMITED is fine. Wondering what is 0 is passed? Should be bail out immediately or should we consider any non-positive number as UNLIMITED ?
          Hide
          dhruba borthakur added a comment -

          Incorporated most of Amar's comments. I left the name of the config parameter as it was earlier.

          Show
          dhruba borthakur added a comment - Incorporated most of Amar's comments. I left the name of the config parameter as it was earlier.
          Hide
          dhruba borthakur added a comment -

          Also, anything less than or equal to zero as value of the config parameter indicates an INVALID value. In this case, the limit checking is not triggered.

          Show
          dhruba borthakur added a comment - Also, anything less than or equal to zero as value of the config parameter indicates an INVALID value. In this case, the limit checking is not triggered.
          Hide
          Amar Kamat added a comment -

          +1.
          The reason why I proposed the new name is because it becomes easier to relate. mapred.job.tasks.maximum would mean that there is an attribute maximum for a component tasks of an entity job under the mapred umbrella. But I am ok with the current name too.

          Show
          Amar Kamat added a comment - +1. The reason why I proposed the new name is because it becomes easier to relate. mapred.job.tasks.maximum would mean that there is an attribute maximum for a component tasks of an entity job under the mapred umbrella. But I am ok with the current name too.
          Hide
          dhruba borthakur added a comment -

          Thanks Amar for reviewing it. I am marking it for 0.19 because this limit is very necessary for clusters that have permanent JobTrackers (not using HOD). Otherwise a single erroneous job could swamp the entire cluster. The fix is very low-risk. I am proposing that this fix gets into 0.19 branch.

          Show
          dhruba borthakur added a comment - Thanks Amar for reviewing it. I am marking it for 0.19 because this limit is very necessary for clusters that have permanent JobTrackers (not using HOD). Otherwise a single erroneous job could swamp the entire cluster. The fix is very low-risk. I am proposing that this fix gets into 0.19 branch.
          Hide
          dhruba borthakur added a comment -

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          Show
          dhruba borthakur added a comment - +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings.
          Hide
          dhruba borthakur added a comment -

          I just committed this.

          Show
          dhruba borthakur added a comment - I just committed this.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #630 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/630/ )

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development