Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-302

Maintaining cluster information across multiple job submissions

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Could we have a way to maintain cluster state across multiple job submissions.
      Consider a scenario where we run multiple jobs in iteration on a cluster back to back. The nature of the job is same, but input/output might differ.

      Now, if a node is blacklisted in one iteration of job run, it would be useful to maintain this information and blacklist this node for next iteration of job as well.
      Another situation which we saw is, if there are failures less than mapred.map.max.attempts in each iterations few nodes are never marked for blacklisting. But in we consider two or three iterations, these nodes fail all jobs and should be taken out of cluster. This hampers overall performance of the job.

      Could have have config variables something which matches a job type (provided by user) and maintains the cluster status for that job type alone?

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          We have seen a similar instance of this one. There were a 5 bad nodes which filled up the userlog directory of tasks. This caused all tasks that were scheduled on this tasktracker to fail. This is ok by itself, but the problem was that since the first task failed immediately, this tasktracker sucked in more and more tasks, and all of them failed. There were a few small jobs (each had about 4 tasks), all these tasks got sucked into the same set of bad tasktrackers and the entire job failed. This continued to occru again and again till the bad nodes were fixed.

          One option would be to have some intelligence in the JT to blacklist tacktrackers across jobs. If a particular task-tracker consistently started failing tasks, maybe the JT should start allocating fewer and fewer tasks to these task trackers. Another option would be that once a fixed number of tasks from a threshold number of jobs failed on a tasktracker, it could be blacklisted for a fixed period of time.

          Show
          dhruba borthakur added a comment - We have seen a similar instance of this one. There were a 5 bad nodes which filled up the userlog directory of tasks. This caused all tasks that were scheduled on this tasktracker to fail. This is ok by itself, but the problem was that since the first task failed immediately, this tasktracker sucked in more and more tasks, and all of them failed. There were a few small jobs (each had about 4 tasks), all these tasks got sucked into the same set of bad tasktrackers and the entire job failed. This continued to occru again and again till the bad nodes were fixed. One option would be to have some intelligence in the JT to blacklist tacktrackers across jobs. If a particular task-tracker consistently started failing tasks, maybe the JT should start allocating fewer and fewer tasks to these task trackers. Another option would be that once a fixed number of tasks from a threshold number of jobs failed on a tasktracker, it could be blacklisted for a fixed period of time.
          Hide
          Runping Qi added a comment -

          Intelligence should be built into the task tracker.
          It should decide whether to ask for tasks to run based on its health state (memory availability, tmp disk space, network connectivity, loads, and other diagnostic information
          such as the history of previous failed tasks).

          Show
          Runping Qi added a comment - Intelligence should be built into the task tracker. It should decide whether to ask for tasks to run based on its health state (memory availability, tmp disk space, network connectivity, loads, and other diagnostic information such as the history of previous failed tasks).
          Hide
          dhruba borthakur added a comment -

          +1 to Runping's comments.

          Show
          dhruba borthakur added a comment - +1 to Runping's comments.
          Hide
          Runping Qi added a comment -

          The patch looks good.
          Make it available now to see if hudson complains.

          Show
          Runping Qi added a comment - The patch looks good. Make it available now to see if hudson complains.
          Hide
          Lohit Vijayarenu added a comment -

          Runping, I guess you missed attaching patch or commented on different JIRA?

          Show
          Lohit Vijayarenu added a comment - Runping, I guess you missed attaching patch or commented on different JIRA?
          Hide
          Runping Qi added a comment -

          withdraw previous operation.
          Worked on wrong jira.

          Show
          Runping Qi added a comment - withdraw previous operation. Worked on wrong jira.
          Hide
          Lohit Vijayarenu added a comment -

          +1 to Runping's comments.

          Should we also think about supporting this for DataNodes? We have been thinking about blacklisting datanodes, faulty ones. Namenode could consider a blacklisted datanode equivalent to 'decommissioned under progress' node. And also, un-blacklisting these nodes; does rebooting them makes them clean and remove from blacklisted nodes?

          Show
          Lohit Vijayarenu added a comment - +1 to Runping's comments. Should we also think about supporting this for DataNodes? We have been thinking about blacklisting datanodes, faulty ones. Namenode could consider a blacklisted datanode equivalent to 'decommissioned under progress' node. And also, un-blacklisting these nodes; does rebooting them makes them clean and remove from blacklisted nodes?
          Hide
          dhruba borthakur added a comment -

          I would like to implement Runping's proposal for tasktrackers. In this case, if a tasktracker has run a consecutive number of tasks (configurable), then it determines that it is a bad node and stops advertising it slots for some time. Once this quiet time period is over, it clears up its state and starts advertising its slots again. The tasktracker does not keep any persistent info regarding this, so restarting the tasktracker will clear up this state.

          Comments?

          Show
          dhruba borthakur added a comment - I would like to implement Runping's proposal for tasktrackers. In this case, if a tasktracker has run a consecutive number of tasks (configurable), then it determines that it is a bad node and stops advertising it slots for some time. Once this quiet time period is over, it clears up its state and starts advertising its slots again. The tasktracker does not keep any persistent info regarding this, so restarting the tasktracker will clear up this state. Comments?
          Hide
          Arun C Murthy added a comment -

          I'm a little leery about putting this intelligence in the TaskTracker - particularly since we do don't have a good story on how the blacklist will survive TaskTracker restarts. Also, we probably need to be careful about which tasks failed since for e.g. some job's tasks might fail since they need excessive amounts of temporary disk-space, while every other job might be just fine...

          I propose we rather keep this centrally, with the JobTracker. We can continue the per-job blacklist as today; in addition the JobTracker can track the per-Job lists and add wonky TaskTrackers to the persistent global list if a TaskTracker has been blacklisted by more than two or three Jobs. This also has the advantage of maintaining state centrally which is easier to manage, persist etc. Also, we would need to add 'admin' commands to let a human operator list, add and remove TaskTrackers from the global blacklist (presumably after the faulty machine got corrected).

          Thoughts?

          Show
          Arun C Murthy added a comment - I'm a little leery about putting this intelligence in the TaskTracker - particularly since we do don't have a good story on how the blacklist will survive TaskTracker restarts. Also, we probably need to be careful about which tasks failed since for e.g. some job's tasks might fail since they need excessive amounts of temporary disk-space, while every other job might be just fine... I propose we rather keep this centrally, with the JobTracker. We can continue the per-job blacklist as today; in addition the JobTracker can track the per-Job lists and add wonky TaskTrackers to the persistent global list if a TaskTracker has been blacklisted by more than two or three Jobs. This also has the advantage of maintaining state centrally which is easier to manage, persist etc. Also, we would need to add 'admin' commands to let a human operator list, add and remove TaskTrackers from the global blacklist (presumably after the faulty machine got corrected). Thoughts?
          Hide
          Owen O'Malley added a comment -

          +1 to storing the information at the JobTracker.

          Show
          Owen O'Malley added a comment - +1 to storing the information at the JobTracker.
          Hide
          dhruba borthakur added a comment -

          The blacklist does not have to survive TT/JT restarts. Since my basic assumption is that this information need not be persisted, hence it makes more sense to do this on the TT rather than on the JT.

          I would like to avoid adding any adminsitrator command to remove TT from the blacklist. I think the system would be more robust and self-healing if the TT periodically exits its own blacklisting and starts behaving normally. I think the tradeoff of making a few tasks fail again on the faulty TT is worth the expense of making it self-healing.

          Show
          dhruba borthakur added a comment - The blacklist does not have to survive TT/JT restarts. Since my basic assumption is that this information need not be persisted, hence it makes more sense to do this on the TT rather than on the JT. I would like to avoid adding any adminsitrator command to remove TT from the blacklist. I think the system would be more robust and self-healing if the TT periodically exits its own blacklisting and starts behaving normally. I think the tradeoff of making a few tasks fail again on the faulty TT is worth the expense of making it self-healing.
          Hide
          dhruba borthakur added a comment -

          We could store the information in the JT, but is it ok to not persist it and to not have an administrator command to remove it from the blacklist? It could be removed from the blacklist automatically (by an age-ing process).

          Show
          dhruba borthakur added a comment - We could store the information in the JT, but is it ok to not persist it and to not have an administrator command to remove it from the blacklist? It could be removed from the blacklist automatically (by an age-ing process).
          Hide
          Runping Qi added a comment -

          I don't think blacklist related info should persist through TT restart.

          I'd like the TT be more adaptive than just being blacklisted or not.

          TT should adaptively adjust the number of tasks it can run concurrently based on current load and its history.
          Before any tasks fail, it may concurrently run tasks up to an configured limit (the number of slots).
          As more and more tasks fail, the limit is decremented.
          As more tasks report success, the limit should be incremented.
          Once the limit reaches to zero, it is equivalent to be in the blacklist state.

          You may be able to implement a more sophisticated adaption policy, but the basic idea will be similar.

          Show
          Runping Qi added a comment - I don't think blacklist related info should persist through TT restart. I'd like the TT be more adaptive than just being blacklisted or not. TT should adaptively adjust the number of tasks it can run concurrently based on current load and its history. Before any tasks fail, it may concurrently run tasks up to an configured limit (the number of slots). As more and more tasks fail, the limit is decremented. As more tasks report success, the limit should be incremented. Once the limit reaches to zero, it is equivalent to be in the blacklist state. You may be able to implement a more sophisticated adaption policy, but the basic idea will be similar.
          Hide
          Allen Wittenauer added a comment -

          I think YARN sort of makes this JIRA obsolete, but I'd like some verification before closing it.

          Show
          Allen Wittenauer added a comment - I think YARN sort of makes this JIRA obsolete, but I'd like some verification before closing it.

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              Lohit Vijayarenu
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development