Hadoop Common
  1. Hadoop Common
  2. HADOOP-657

Free temporary space should be modelled better

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently, there is a configurable size that must be free for a task tracker to accept a new task. However, that isn't a very good model of what the task is likely to take. I'd like to propose:

      Map tasks: totalInputSize * conf.getFloat("map.output.growth.factor", 1.0) / numMaps
      Reduce tasks: totalInputSize * 2 * conf.getFloat("map.output.growth.factor", 1.0) / numReduces

      where totalInputSize is the size of all the maps inputs for the given job.

      To start a new task,
      newTaskAllocation + (sum over running tasks of (1.0 - done) * allocation) >=
      free disk * conf.getFloat("mapred.max.scratch.allocation", 0.90);

      So in English, we will model the expected sizes of tasks and only task tasks that should leave us a 10% margin. With:
      map.output.growth.factor – the relative size of the transient data relative to the map inputs
      mapred.max.scratch.allocation – the maximum amount of our disk we want to allocate to tasks.

      1. diskspaceest.patch
        18 kB
        Ari Rabkin
      2. diskspaceest_v2.patch
        18 kB
        Ari Rabkin
      3. diskspaceest_v3.patch
        18 kB
        Ari Rabkin
      4. diskspaceest_v4.patch
        18 kB
        Ari Rabkin
      5. clean_spaceest.patch
        17 kB
        Ari Rabkin
      6. spaceest_717.patch
        17 kB
        Ari Rabkin

        Issue Links

          Activity

          Owen O'Malley created issue -
          Arun C Murthy made changes -
          Field Original Value New Value
          Assignee Owen O'Malley [ owen.omalley ] Arun C Murthy [ acmurthy ]
          Hide
          Arun C Murthy added a comment -

          Looks like Owen's got it nailed pretty good... anyone wants to shout on this one?

          What do you guys think is a reasonable default for "map.output.growth.factor" ? 1.0? Do we need more leeway?

          Show
          Arun C Murthy added a comment - Looks like Owen's got it nailed pretty good... anyone wants to shout on this one? What do you guys think is a reasonable default for "map.output.growth.factor" ? 1.0? Do we need more leeway?
          Hide
          Arun C Murthy added a comment -

          Current flow relevant to this discussion:

          TaskTracker.offerService() -> TaskTracker.checkForNewTasks() -> if (TaskTracker.enoughFreeSpace()) then poll/startNewTask

          We could put the above checks (infact we can do better by checking if we have assigned fileSplit's size * conf.getFloat("map.output.growth.factor", 1.0)) in TaskTracker.enoughFreeSpace()...

          ... alternatively we could make '(sum over running tasks of (1.0 - done) * allocation)' part of TaskTrackerStatus i.e. a 'availableDiskSpace' member, check to ensure that 'sufficient' free space is available on the tasktracker before assigning it the task itself in JobInProgress.findNewTask - this ensures that a task isn't allocated in the first place to a tasktracker if it can't handle it.

          What do you guys think? Am I missing out on something which prevents option #2 from working?

          Show
          Arun C Murthy added a comment - Current flow relevant to this discussion: TaskTracker.offerService() -> TaskTracker.checkForNewTasks() -> if (TaskTracker.enoughFreeSpace()) then poll/startNewTask We could put the above checks (infact we can do better by checking if we have assigned fileSplit's size * conf.getFloat("map.output.growth.factor", 1.0)) in TaskTracker.enoughFreeSpace()... ... alternatively we could make '(sum over running tasks of (1.0 - done) * allocation)' part of TaskTrackerStatus i.e. a 'availableDiskSpace' member, check to ensure that 'sufficient' free space is available on the tasktracker before assigning it the task itself in JobInProgress.findNewTask - this ensures that a task isn't allocated in the first place to a tasktracker if it can't handle it. What do you guys think? Am I missing out on something which prevents option #2 from working?
          Hide
          Owen O'Malley added a comment -

          option #2 with putting the "unallocated" disk space into the TaskTrackerStatus should work well.

          Show
          Owen O'Malley added a comment - option #2 with putting the "unallocated" disk space into the TaskTrackerStatus should work well.
          Hide
          Doug Cutting added a comment -

          We should also log something when tasks are not accepted due to space limitations. At present I think nothing is displayed in this case.

          Show
          Doug Cutting added a comment - We should also log something when tasks are not accepted due to space limitations. At present I think nothing is displayed in this case.
          Hide
          Arun C Murthy added a comment -

          It should be very useful to include the 'unallocated disk space' in the global and per-host jsps so as to provide an easy way for operator to diagnose if and when tasks can't be allocated due to lack of diskspace... I think it should be a part of this same bug.

          Show
          Arun C Murthy added a comment - It should be very useful to include the 'unallocated disk space' in the global and per-host jsps so as to provide an easy way for operator to diagnose if and when tasks can't be allocated due to lack of diskspace... I think it should be a part of this same bug.
          Hide
          Doug Cutting added a comment -

          > include the 'unallocated disk space' in the global and per-host jsps

          Yes, I agree this is another metric that should be displayed in the central web ui. It should be reported through the metrics API, as discussed in HADOOP-481 and HADOOP-481. The MapReduce framework should implement a MetricsContext, and system (and perhaps user) code can use this to route statisitics to central locations. I don't think we want to fix that as a part of this issue, but I also don't think we should hack in an alternate mechanism just for this statistic.

          Show
          Doug Cutting added a comment - > include the 'unallocated disk space' in the global and per-host jsps Yes, I agree this is another metric that should be displayed in the central web ui. It should be reported through the metrics API, as discussed in HADOOP-481 and HADOOP-481 . The MapReduce framework should implement a MetricsContext, and system (and perhaps user) code can use this to route statisitics to central locations. I don't think we want to fix that as a part of this issue, but I also don't think we should hack in an alternate mechanism just for this statistic.
          Hide
          Arun C Murthy added a comment -

          I see the value in Doug's suggestion... for e.g. at some point in the future we might also put in metrics like CPU load, VM stats etc. and this would let the JobTracker make 'smarter' decisions about which task to assign to which TaskTrackers i.e. CPU-bound tasks to IO-laden TTs and vice-versa.

          I do agree that it might be a very futuristic scenario, but the point is to keep the infrastructure robust when we can...

          Show
          Arun C Murthy added a comment - I see the value in Doug's suggestion... for e.g. at some point in the future we might also put in metrics like CPU load, VM stats etc. and this would let the JobTracker make 'smarter' decisions about which task to assign to which TaskTrackers i.e. CPU-bound tasks to IO-laden TTs and vice-versa. I do agree that it might be a very futuristic scenario, but the point is to keep the infrastructure robust when we can...
          Hide
          Owen O'Malley added a comment -

          Actually, I'd rather have the unallocated disk space in the heartbeat, because when HADOOP-639 is implemented, the JobTracker should be given fresh information to decide which tasks should be launched there.

          Show
          Owen O'Malley added a comment - Actually, I'd rather have the unallocated disk space in the heartbeat, because when HADOOP-639 is implemented, the JobTracker should be given fresh information to decide which tasks should be launched there.
          Hide
          Doug Cutting added a comment -

          > I'd rather have the unallocated disk space in the heartbeat [ ...]

          I agree, but I think we should use a general mechanism to route metrics to the jobtracker through heartbeats, rather than hack things in one-by-one.

          Show
          Doug Cutting added a comment - > I'd rather have the unallocated disk space in the heartbeat [ ...] I agree, but I think we should use a general mechanism to route metrics to the jobtracker through heartbeats, rather than hack things in one-by-one.
          Hide
          Arun C Murthy added a comment -

          Taking things forward, looks like both Owen/Doug agree that we need to send metrics through the heartbeat...

          ... given this we are looking at implementing a MapReduceMetricsContext which sends over heartbeat (with metrics) via RPC to the JT and TaskTracker.offerService() becoming a callback for the timer. Does that make sense or do folks prefer something different?

          Show
          Arun C Murthy added a comment - Taking things forward, looks like both Owen/Doug agree that we need to send metrics through the heartbeat... ... given this we are looking at implementing a MapReduceMetricsContext which sends over heartbeat (with metrics) via RPC to the JT and TaskTracker.offerService() becoming a callback for the timer. Does that make sense or do folks prefer something different?
          Hide
          Arun C Murthy added a comment -

          It also necessiates a 'Writable' MetricsRecordImpl (for RPC) and some apis for 'reading' the metrics i.e. getMetric/getTag apis which the JobTracker can use to retrieve information.

          Show
          Arun C Murthy added a comment - It also necessiates a 'Writable' MetricsRecordImpl (for RPC) and some apis for 'reading' the metrics i.e. getMetric/getTag apis which the JobTracker can use to retrieve information.
          Arun C Murthy made changes -
          Assignee Arun C Murthy [ acmurthy ] Ari Rabkin [ asrabkin ]
          Hide
          Ari Rabkin added a comment -

          Here's my proposed fix:

          1) Add a "free space on compute node" field to TaskTrackerStatus. This is the real physical space available, plus the sum of (commitment - reservation) for each running map task.

          2) Add a "space used by this task" and "space reserved for task" to TaskStatus as well.

          3) Add a "space to reserve" to either Task or MapTask. This is computed by the JobTracker, and used by the TaskTracker

          4) Create a new ResourceConsumptionEstimator class, and have an instance of that type for each JobInProgress. This will have, at a minimum, reportCompletedMapTask(MapTaskStatus t) and estimateSpaceForMapTask(MapTask mt) The implementation would probably be a thread that processes asynchronously, and updates an atomic value that'll be either the estimated space requirement, or else the estimated ratio between input size and output size. Until sufficiently many maps have completed (10%, say) the size estimate would just be the size of each map's input. Afterwards, we'll take the 75th percentile of the measured blowup in task size.

          5) Modify obtainNewMapTask to return null if the space available on the given task tracker is less than the estimate of available space.

          6) To avoid deadlocks if there are multiple jobs contending for space, abort the job if too many trackers are rejected as having insufficient space.


          Thoughts?

          Show
          Ari Rabkin added a comment - Here's my proposed fix: 1) Add a "free space on compute node" field to TaskTrackerStatus. This is the real physical space available, plus the sum of (commitment - reservation) for each running map task. 2) Add a "space used by this task" and "space reserved for task" to TaskStatus as well. 3) Add a "space to reserve" to either Task or MapTask. This is computed by the JobTracker, and used by the TaskTracker 4) Create a new ResourceConsumptionEstimator class, and have an instance of that type for each JobInProgress. This will have, at a minimum, reportCompletedMapTask(MapTaskStatus t) and estimateSpaceForMapTask(MapTask mt) The implementation would probably be a thread that processes asynchronously, and updates an atomic value that'll be either the estimated space requirement, or else the estimated ratio between input size and output size. Until sufficiently many maps have completed (10%, say) the size estimate would just be the size of each map's input. Afterwards, we'll take the 75th percentile of the measured blowup in task size. 5) Modify obtainNewMapTask to return null if the space available on the given task tracker is less than the estimate of available space. 6) To avoid deadlocks if there are multiple jobs contending for space, abort the job if too many trackers are rejected as having insufficient space. Thoughts?
          Hide
          Ari Rabkin added a comment -

          Here's a stab at solving the issue.

          I've tested this locally, and it doesn't break anything, when space IS available. I haven't yet tested this in the low-disk space case.

          Show
          Ari Rabkin added a comment - Here's a stab at solving the issue. I've tested this locally, and it doesn't break anything, when space IS available. I haven't yet tested this in the low-disk space case.
          Ari Rabkin made changes -
          Affects Version/s 0.7.2 [ 12312118 ]
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 0.17.0 [ 12312913 ]
          Hide
          Ari Rabkin added a comment -

          The associated patch.

          Show
          Ari Rabkin added a comment - The associated patch.
          Ari Rabkin made changes -
          Attachment diskspaceest.patch [ 12382976 ]
          Hide
          Devaraj Das added a comment -

          4) Create a new ResourceConsumptionEstimator class, and have an instance of that type for each JobInProgress. This will have, at a minimum, reportCompletedMapTask(MapTaskStatus t) and estimateSpaceForMapTask(MapTask mt) The implementation would probably be a thread that processes asynchronously, and updates an atomic value that'll be either the estimated space requirement, or else the estimated ratio between input size and output size. Until sufficiently many maps have completed (10%, say) the size estimate would just be the size of each map's input. Afterwards, we'll take the 75th percentile of the measured blowup in task size.

          Ari, I haven't looked at the patch yet, but it'd help if could you please give an example for this one with some numbers.

          Show
          Devaraj Das added a comment - 4) Create a new ResourceConsumptionEstimator class, and have an instance of that type for each JobInProgress. This will have, at a minimum, reportCompletedMapTask(MapTaskStatus t) and estimateSpaceForMapTask(MapTask mt) The implementation would probably be a thread that processes asynchronously, and updates an atomic value that'll be either the estimated space requirement, or else the estimated ratio between input size and output size. Until sufficiently many maps have completed (10%, say) the size estimate would just be the size of each map's input. Afterwards, we'll take the 75th percentile of the measured blowup in task size. Ari, I haven't looked at the patch yet, but it'd help if could you please give an example for this one with some numbers.
          Hide
          Ari Rabkin added a comment -

          Here's what we have currently. ResourceEstimator keeps an estimate of how big the average map's output is. As Map tasks complete, we update this. If a node has less than twice the average outputsize in free disk space, we don't assign tasks to it. Haven't implemented the percentile aspect; average is computationally much easier.

          So if a job has 10 GB of input, split across ten map tasks, tasks will only be started on nodes with at least two gigabytes free.

          It's been tested locally, and indeed, jobs only go to a task tracker with sufficient space. Next step is testing at scale, on a cluster,

          Show
          Ari Rabkin added a comment - Here's what we have currently. ResourceEstimator keeps an estimate of how big the average map's output is. As Map tasks complete, we update this. If a node has less than twice the average outputsize in free disk space, we don't assign tasks to it. Haven't implemented the percentile aspect; average is computationally much easier. So if a job has 10 GB of input, split across ten map tasks, tasks will only be started on nodes with at least two gigabytes free. It's been tested locally, and indeed, jobs only go to a task tracker with sufficient space. Next step is testing at scale, on a cluster,
          Hide
          Ari Rabkin added a comment -

          Current patch.

          Limitations:
          Not confident we're making a sound estimate for Reduce space consumption.
          Doesn't detect deadlock or starvation.

          Show
          Ari Rabkin added a comment - Current patch. Limitations: Not confident we're making a sound estimate for Reduce space consumption. Doesn't detect deadlock or starvation.
          Ari Rabkin made changes -
          Attachment diskspaceest_v2.patch [ 12383054 ]
          Ari Rabkin made changes -
          Link This issue is blocked by HADOOP-3441 [ HADOOP-3441 ]
          Ari Rabkin made changes -
          Link This issue incorporates HADOOP-3441 [ HADOOP-3441 ]
          Ari Rabkin made changes -
          Link This issue is blocked by HADOOP-3441 [ HADOOP-3441 ]
          Hide
          Ari Rabkin added a comment -

          temporarily canceling patch; expect final version within a few days.

          Show
          Ari Rabkin added a comment - temporarily canceling patch; expect final version within a few days.
          Ari Rabkin made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Ari Rabkin added a comment -

          Tested on a small cluster, seems to work.
          One limitation is that we don't automatically resolve deadlocks. This could be done, e.g., by failing tasks that can't be placed for a long period.

          Show
          Ari Rabkin added a comment - Tested on a small cluster, seems to work. One limitation is that we don't automatically resolve deadlocks. This could be done, e.g., by failing tasks that can't be placed for a long period.
          Ari Rabkin made changes -
          Fix Version/s 0.19.0 [ 12313211 ]
          Status Open [ 1 ] Patch Available [ 10002 ]
          Ari Rabkin made changes -
          Attachment diskspaceest_v3.patch [ 12383592 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12383592/diskspaceest_v3.patch
          against trunk revision 664208.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

          -1 release audit. The applied patch generated 196 release audit warnings (more than the trunk's current 195 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2613/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2613/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2613/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2613/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2613/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383592/diskspaceest_v3.patch against trunk revision 664208. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. -1 release audit. The applied patch generated 196 release audit warnings (more than the trunk's current 195 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2613/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2613/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2613/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2613/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2613/console This message is automatically generated.
          Hide
          Ari Rabkin added a comment -

          Fix pedantic fixbugs warnings; add license at top of file.

          Notes:
          No tests supplied, since there's no simple or painless way to run unit tests across multiple filesystems.

          TestHarFileSystem error is believed unrelated to patch.

          Show
          Ari Rabkin added a comment - Fix pedantic fixbugs warnings; add license at top of file. Notes: No tests supplied, since there's no simple or painless way to run unit tests across multiple filesystems. TestHarFileSystem error is believed unrelated to patch.
          Ari Rabkin made changes -
          Attachment diskspaceest_v4.patch [ 12383863 ]
          Hide
          Ari Rabkin added a comment -

          cancelling to resubmit

          Show
          Ari Rabkin added a comment - cancelling to resubmit
          Ari Rabkin made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Ari Rabkin made changes -
          Attachment diskspaceest_v4.patch [ 12383863 ]
          Hide
          Ari Rabkin added a comment -

          revised, should no longer trigger findbugs

          Show
          Ari Rabkin added a comment - revised, should no longer trigger findbugs
          Ari Rabkin made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Ari Rabkin added a comment -

          here's the patch

          Show
          Ari Rabkin added a comment - here's the patch
          Ari Rabkin made changes -
          Attachment diskspaceest_v4.patch [ 12384165 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12384165/diskspaceest_v4.patch
          against trunk revision 668867.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2677/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2677/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2677/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2677/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384165/diskspaceest_v4.patch against trunk revision 668867. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2677/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2677/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2677/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2677/console This message is automatically generated.
          Ari Rabkin made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Ari Rabkin added a comment -

          clean up format of code

          Show
          Ari Rabkin added a comment - clean up format of code
          Ari Rabkin made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Ari Rabkin made changes -
          Attachment clean_spaceest.patch [ 12385993 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385993/clean_spaceest.patch
          against trunk revision 676772.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2861/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2861/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2861/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2861/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385993/clean_spaceest.patch against trunk revision 676772. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2861/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2861/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2861/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2861/console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          HADOOP-3581 tries to manage memory used by tasks. I am trying to follow the approach of this JIRA, and have a couple of comments.

          • I see that you are having free-space computation inside the task. Instead, why can't we do it in the tasktracker itself? In this JIRA, we are caring only about mapOutputFiles and for watching them, we just need the JOB ID and TIP ID. Memory tracking HAS to be done in TT and not task, to shield the tracking business itself from any rogue tasks. I think it would be good if we can manage both these resources in TT itself, ultimately moving all of these into a single resource management class in TT. Unless I am missing something else here. Thoughts?
          • I also see in this patch that availableSpace is sent to JT via TaskTrackerStatus. What happened to Doug's idea of "using a general mechanism to route metrics to the jobtracker through heartbeats, rather than hack things in one-by-one". A general mechanism like the one Arun proposed (MetricsContext) would also help HADOOP-3759(which intends to use freeMemory information for scheduling decisions).
          Show
          Vinod Kumar Vavilapalli added a comment - HADOOP-3581 tries to manage memory used by tasks. I am trying to follow the approach of this JIRA, and have a couple of comments. I see that you are having free-space computation inside the task. Instead, why can't we do it in the tasktracker itself? In this JIRA, we are caring only about mapOutputFiles and for watching them, we just need the JOB ID and TIP ID. Memory tracking HAS to be done in TT and not task, to shield the tracking business itself from any rogue tasks. I think it would be good if we can manage both these resources in TT itself, ultimately moving all of these into a single resource management class in TT. Unless I am missing something else here. Thoughts? I also see in this patch that availableSpace is sent to JT via TaskTrackerStatus. What happened to Doug's idea of "using a general mechanism to route metrics to the jobtracker through heartbeats, rather than hack things in one-by-one". A general mechanism like the one Arun proposed (MetricsContext) would also help HADOOP-3759 (which intends to use freeMemory information for scheduling decisions).
          Hide
          Ari Rabkin added a comment -

          I don't have strong feelings about whether to do space-consumed measurement in the TaskTracker or the Task. I figured it made more sense to fill out the whole TaskStatus in one place.Otherwise you get confused in the TaskTracker code, whether or not the space-consumed has been filled in yet. I'm open to doing this the other way 'round, and having TaskTracker responsible for it. Certainly if there were other similar resource counters being filled in in TaskTracker, this one ought to be.

          I was tempted to use metrics for this, and looked at piggybacking of this sort of thing more generally on heartbeats. I was promptly shot down. There was a strong sentiment, notably from Owen and Arun, that Hadoop's core functionality shouldn't depend on Metrics, and that Metrics should just be for analytics.

          Show
          Ari Rabkin added a comment - I don't have strong feelings about whether to do space-consumed measurement in the TaskTracker or the Task. I figured it made more sense to fill out the whole TaskStatus in one place.Otherwise you get confused in the TaskTracker code, whether or not the space-consumed has been filled in yet. I'm open to doing this the other way 'round, and having TaskTracker responsible for it. Certainly if there were other similar resource counters being filled in in TaskTracker, this one ought to be. I was tempted to use metrics for this, and looked at piggybacking of this sort of thing more generally on heartbeats. I was promptly shot down. There was a strong sentiment, notably from Owen and Arun, that Hadoop's core functionality shouldn't depend on Metrics, and that Metrics should just be for analytics.
          Hide
          Ari Rabkin added a comment -

          no more need to modify Task

          Show
          Ari Rabkin added a comment - no more need to modify Task
          Ari Rabkin made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Ari Rabkin added a comment -

          As per Vinod's suggestion, move lookup into TaskTracker.

          Show
          Ari Rabkin added a comment - As per Vinod's suggestion, move lookup into TaskTracker.
          Ari Rabkin made changes -
          Attachment spaceest_717.patch [ 12386347 ]
          Ari Rabkin made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12386347/spaceest_717.patch
          against trunk revision 677781.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2896/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2896/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2896/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2896/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12386347/spaceest_717.patch against trunk revision 677781. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2896/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2896/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2896/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2896/console This message is automatically generated.
          Hide
          Owen O'Malley added a comment -

          I just committed this. Thanks, Ari!

          Show
          Owen O'Malley added a comment - I just committed this. Thanks, Ari!
          Owen O'Malley made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Jothi Padmanabhan added a comment -

          The reduce task for sortvalidator , 500 nodes, seem to get stuck with the following message (several of them), even though the sort itself succeeded fine. Could there be a bug in the estimation of the reduce input size?

          2008-08-14 07:56:16,507 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node tracker_xxx.com:xxx..com/<IPADDR>:58251 has 204889718784 bytes free; but we expect reduce input to take 1004644589190
          2008-08-14 07:56:16,508 INFO org.apache.hadoop.mapred.ResourceEstimator: estimate map will take 150463470 bytes. (blowup = 2*0.04748320346406185)
          346406185)

          Show
          Jothi Padmanabhan added a comment - The reduce task for sortvalidator , 500 nodes, seem to get stuck with the following message (several of them), even though the sort itself succeeded fine. Could there be a bug in the estimation of the reduce input size? 2008-08-14 07:56:16,507 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node tracker_xxx.com:xxx..com/<IPADDR>:58251 has 204889718784 bytes free; but we expect reduce input to take 1004644589190 2008-08-14 07:56:16,508 INFO org.apache.hadoop.mapred.ResourceEstimator: estimate map will take 150463470 bytes. (blowup = 2*0.04748320346406185) 346406185)
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s mapred [ 12310690 ]

            People

            • Assignee:
              Ari Rabkin
              Reporter:
              Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development