Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1608

Allow users to do speculative execution of a task manually

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Speculative execution improves the latency of the job. Sometimes the job has few very slow reducers. Spending a little more resource on speculative tasks can improve the latency a lot. It will be nice that the users can manually select one task and force the speculative execution on that task just like we can manually kill/fail task.

      The proposal is add link says "speculate" in taskdetails.jsp page where we do "kill/fail".

      Thoughts?

        Activity

        Hide
        Arun C Murthy added a comment -

        I meant 'job-specific', not 'user-specific'.

        Show
        Arun C Murthy added a comment - I meant 'job-specific', not 'user-specific'.
        Hide
        Arun C Murthy added a comment -

        Scott, thanks for bringing that to my attention, I wasn't aware of that. That is a highly dangerous knob - I'll open a jira to make it cluster-specific and not user-specific.

        Show
        Arun C Murthy added a comment - Scott, thanks for bringing that to my attention, I wasn't aware of that. That is a highly dangerous knob - I'll open a jira to make it cluster-specific and not user-specific.
        Hide
        Scott Chen added a comment -

        Hi Arun,

        How do you prevent devious/malicious users from speculating on all their tasks via a simple shell script?

        Currently if a user wants to do that, the user can achieve this by setting

         mapreduce.job.speculative.slowtaskthreshold = 0
        

        This way is even simpler.

        Show
        Scott Chen added a comment - Hi Arun, How do you prevent devious/malicious users from speculating on all their tasks via a simple shell script? Currently if a user wants to do that, the user can achieve this by setting mapreduce.job.speculative.slowtaskthreshold = 0 This way is even simpler.
        Hide
        Hong Tang added a comment -

        How do you prevent devious/malicious users from speculating on all their tasks via a simple shell script?

        This is an orthogonal issue. An easy solution would be to set an upper limit of how many tasks that one can speculate manually, e.g. 1% of total

        {map|reduce}

        tasks.

        Show
        Hong Tang added a comment - How do you prevent devious/malicious users from speculating on all their tasks via a simple shell script? This is an orthogonal issue. An easy solution would be to set an upper limit of how many tasks that one can speculate manually, e.g. 1% of total {map|reduce} tasks.
        Hide
        Arun C Murthy added a comment -

        How do you prevent devious/malicious users from speculating on all their tasks via a simple shell script?

        Show
        Arun C Murthy added a comment - How do you prevent devious/malicious users from speculating on all their tasks via a simple shell script?
        Hide
        Hong Tang added a comment -

        But when there is a manual-speculate-now button, it can override the system-specified speculation heuristics for that task. To go a ste further, the system can actually learn from the user-specified-speculation on how to automatically make the heuristics better, i.e. crowd-sourcing to the rescue!

        I agree that knowledgeable users have more information about their programs and probably can speculate more accurately. Such info may also be used for the verification (future tuning) of speculative execution algorithms (e.g. through simulation). So here is my +1 on the idea.

        Show
        Hong Tang added a comment - But when there is a manual-speculate-now button, it can override the system-specified speculation heuristics for that task. To go a ste further, the system can actually learn from the user-specified-speculation on how to automatically make the heuristics better, i.e. crowd-sourcing to the rescue! I agree that knowledgeable users have more information about their programs and probably can speculate more accurately. Such info may also be used for the verification (future tuning) of speculative execution algorithms (e.g. through simulation). So here is my +1 on the idea.
        Hide
        Scott Chen added a comment -

        We can make this feature configurable.
        For example, we put

        "mapreduce.cluster.speculation.controlable"
        

        in MRConfig.java to allow the administrator to turn on/off this feature.

        What do you think?

        Show
        Scott Chen added a comment - We can make this feature configurable. For example, we put "mapreduce.cluster.speculation.controlable" in MRConfig.java to allow the administrator to turn on/off this feature. What do you think?
        Hide
        Scott Chen added a comment -

        We can also add a cli option like

        hadoop job -speculate-task <tid>
        

        We have been using this feature for a while.
        It is very helpful for our users. Sometimes this little feature can save lots of time.

        Show
        Scott Chen added a comment - We can also add a cli option like hadoop job -speculate-task <tid> We have been using this feature for a while. It is very helpful for our users. Sometimes this little feature can save lots of time.
        Hide
        dhruba borthakur added a comment -

        Amar, thanks for ur comments. For interactive workloads, this feature has worked awesomely for us.

        > As Todd/Hong have mentioned we should consider making our heuristics better. I'm not sure this is a good idea.

        I think Arun is completely missing the point here. It is fine to make heuristics better and better, I have nothing against that. But when there is a manual-speculate-now button, it can override the system-specified speculation heuristics for that task. To go a ste further, the system can actually learn from the user-specified-speculation on how to automatically make the heuristics better, i.e. crowd-sourcing to the rescue!

        Show
        dhruba borthakur added a comment - Amar, thanks for ur comments. For interactive workloads, this feature has worked awesomely for us. > As Todd/Hong have mentioned we should consider making our heuristics better. I'm not sure this is a good idea. I think Arun is completely missing the point here. It is fine to make heuristics better and better, I have nothing against that. But when there is a manual-speculate-now button, it can override the system-specified speculation heuristics for that task. To go a ste further, the system can actually learn from the user-specified-speculation on how to automatically make the heuristics better, i.e. crowd-sourcing to the rescue!
        Hide
        Adam Kramer added a comment -

        As a user, I have found the ability to manually speculate tasks via the website incredibly useful--so useful that I'm starting to worry about RSI given that each speculation takes a click to the task page, a click to the task, a click on speculate, and a click on the confirm dialog box. These are frequently lost-task-tracker failures, and Hadoop currently just sets a timeout on them.

        But how am I beating the current system? I'm comparing some tasks' performance to other tasks in the same job:

        1) If there is only one task (either map or reduce) always speculate. Maybe turn this off for clusters that have very few slots, but in the case of >1000 slots or so, this is trivial and would basically prevent jobs taking literally twice as long.

        2) Collect data on other tasks in the same job. If 99% of mappers went from 0% complete to >0% complete in 5 seconds and it's been 5 minutes while the last 5% of mappers change, speculate them. Ditto reducers. Unbalanced data may cause these problems,

        3) Collect data on delays. If a task doesn't improve its % complete in some timeframe determined by the other tasks for the same job, speculate the "hung" task.

        ...in other words, I agree that there is probably an easy way to model the failed tasks, but only from a modeling perspective. Getting the heuristics and models right and implementing them is probably much much more difficult than implemeting "hadoop job -speculate-task task_identifier_here."

        But also, and implementing the latter is necessary to discover how and when the heuristics themselves are failing...giving users the ability to do this also gives admins the ability to see when users are doing this.

        Show
        Adam Kramer added a comment - As a user, I have found the ability to manually speculate tasks via the website incredibly useful--so useful that I'm starting to worry about RSI given that each speculation takes a click to the task page, a click to the task, a click on speculate, and a click on the confirm dialog box. These are frequently lost-task-tracker failures, and Hadoop currently just sets a timeout on them. But how am I beating the current system? I'm comparing some tasks' performance to other tasks in the same job: 1) If there is only one task (either map or reduce) always speculate. Maybe turn this off for clusters that have very few slots, but in the case of >1000 slots or so, this is trivial and would basically prevent jobs taking literally twice as long. 2) Collect data on other tasks in the same job. If 99% of mappers went from 0% complete to >0% complete in 5 seconds and it's been 5 minutes while the last 5% of mappers change, speculate them. Ditto reducers. Unbalanced data may cause these problems, 3) Collect data on delays. If a task doesn't improve its % complete in some timeframe determined by the other tasks for the same job, speculate the "hung" task. ...in other words, I agree that there is probably an easy way to model the failed tasks, but only from a modeling perspective. Getting the heuristics and models right and implementing them is probably much much more difficult than implemeting "hadoop job -speculate-task task_identifier_here." But also, and implementing the latter is necessary to discover how and when the heuristics themselves are failing...giving users the ability to do this also gives admins the ability to see when users are doing this.
        Hide
        Amar Kamat added a comment -

        Interesting usage case. Are there other tasks running on the same node as the lone reduce task?

        Speculation makes sense when you compare similar tasks as we can easily rule out the code/logic differences.

        The way I understand this is that you are trying to label/weigh the tasktracker based on how the currently running tasks are behaving on the given tracker. How about making a note of how many tasks (of a given type) on a given tracker got speculated and making scheduling decisions based on that. There might be cases where all the reducers get speculated on a given tracker. This should result into not scheduling reduces on such nodes and in future utilize it (i.e all the slots) completely for maps.

        For example:
        If a tracker T asks for new tasks of type X from job J, then schedule the task X of job J on T if
        (number of tasks speculated of T / number of tasks scheduled on T) < C (where C is the threshold and default value = 0.8)

        This condition says a simple thing, if the job cares about speculation, the there is no point in running a task of type X on tracker T as there is higher chance that it might get speculated. Thoughts?

        Show
        Amar Kamat added a comment - Interesting usage case. Are there other tasks running on the same node as the lone reduce task? Speculation makes sense when you compare similar tasks as we can easily rule out the code/logic differences. The way I understand this is that you are trying to label/weigh the tasktracker based on how the currently running tasks are behaving on the given tracker. How about making a note of how many tasks (of a given type) on a given tracker got speculated and making scheduling decisions based on that. There might be cases where all the reducers get speculated on a given tracker. This should result into not scheduling reduces on such nodes and in future utilize it (i.e all the slots) completely for maps. For example: If a tracker T asks for new tasks of type X from job J, then schedule the task X of job J on T if (number of tasks speculated of T / number of tasks scheduled on T) < C (where C is the threshold and default value = 0.8) This condition says a simple thing, if the job cares about speculation, the there is no point in running a task of type X on tracker T as there is higher chance that it might get speculated. Thoughts?
        Hide
        Todd Lipcon added a comment -

        I think Hong's idea is really clever, though. Do we have the right data structures in place to do it efficiently, already? Or would we have to add more per-node data similar to the faulty tracker stuff?

        Show
        Todd Lipcon added a comment - I think Hong's idea is really clever, though. Do we have the right data structures in place to do it efficiently, already? Or would we have to add more per-node data similar to the faulty tracker stuff?
        Hide
        Arun C Murthy added a comment -

        This is an odd one.

        Speculative execution is, in some sense, a pure overhead. Allowing users to trigger this without checks and balances has significant consequences...

        As Todd/Hong have mentioned we should consider making our heuristics better. I'm not sure this is a good idea.

        Show
        Arun C Murthy added a comment - This is an odd one. Speculative execution is, in some sense, a pure overhead . Allowing users to trigger this without checks and balances has significant consequences... As Todd/Hong have mentioned we should consider making our heuristics better. I'm not sure this is a good idea.
        Hide
        Hong Tang added a comment -

        Interesting usage case. Are there other tasks running on the same node as the lone reduce task? If yes, are they slower than their peers and perhaps that would be an indication the node is ill-behaving and all tasks on that node should be speculatively executed?

        Show
        Hong Tang added a comment - Interesting usage case. Are there other tasks running on the same node as the lone reduce task? If yes, are they slower than their peers and perhaps that would be an indication the node is ill-behaving and all tasks on that node should be speculatively executed?
        Hide
        Scott Chen added a comment -

        Thanks, Tood. That's a good point. We should also work on making the tuning knobs better.

        The reason why I am proposing this is that one of our user ask me if there is speculative execution in a single reducer case.
        Our speculation policy is based on comparing a task to other tasks (or some average behavior).
        So in this case the speculative execution will not be triggered. But the human knows that this task is slow.

        The tuning knobs can be very good but it is hard to make it perfect.
        It is good to have some control over the task when we need it (just like Kill/Fail task).

        Show
        Scott Chen added a comment - Thanks, Tood. That's a good point. We should also work on making the tuning knobs better. The reason why I am proposing this is that one of our user ask me if there is speculative execution in a single reducer case. Our speculation policy is based on comparing a task to other tasks (or some average behavior). So in this case the speculative execution will not be triggered. But the human knows that this task is slow. The tuning knobs can be very good but it is hard to make it perfect. It is good to have some control over the task when we need it (just like Kill/Fail task).
        Hide
        Todd Lipcon added a comment -

        I'm not entirely against a manual trigger, but this suggests that our current tuning knobs are insufficient. What are the cases when users would want to do this manually, and why don't our heuristics do it for them? I'm skeptical that humans can do a better job than the scheduler at deciding when speculation is necessary.

        Show
        Todd Lipcon added a comment - I'm not entirely against a manual trigger, but this suggests that our current tuning knobs are insufficient. What are the cases when users would want to do this manually, and why don't our heuristics do it for them? I'm skeptical that humans can do a better job than the scheduler at deciding when speculation is necessary.

          People

          • Assignee:
            Scott Chen
            Reporter:
            Scott Chen
          • Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:

              Development