Hadoop Common
  1. Hadoop Common
  2. HADOOP-3420

Recover the deprecated mapred.tasktracker.tasks.maximum

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.16.0, 0.16.1, 0.16.2, 0.16.3, 0.16.4
    • Fix Version/s: None
    • Component/s: conf
    • Labels:
      None

      Description

      https://issues.apache.org/jira/browse/HADOOP-1274 replaced the configuration attribute mapred.tasktracker.tasks.maximum with mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum because it sometimes make sense to have more mappers than reducers assigned to each node.

      But deprecating mapred.tasktracker.tasks.maximum could be an issue in some situations. For example, when more than one job is running, reduce tasks + map tasks eat too many resources. For avoid this cases an upper limit of tasks is needed. So I propose to have the configuration parameter mapred.tasktracker.tasks.maximum as a total limit of task. It is compatible with mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum.

      As an example:

      I have a 8 cores, 4GB, 4 nodes cluster. I want to limit the number of tasks per node to 8. 8 tasks per nodes would use almost 100% cpu and 4 GB of the memory. I have set:
       mapred.tasktracker.map.tasks.maximum -> 8
      mapred.tasktracker.reduce.tasks.maximum -> 8

      1) When running only one Job at the same time, it works smoothly: 8 task average per node, no swapping in nodes, almost 4 GB of memory usage and 100% of CPU usage.

      2) When running more than one Job at the same time, it works really bad: 16 tasks average per node, 8 GB usage of memory (4 GB swapped), and a lot of System CPU usage.

      So, I think that have sense to restore the old attribute mapred.tasktracker.tasks.maximum making it compatible with the new ones.

      Task trackers could not:

      • run more than mapred.tasktracker.tasks.maximum tasks per node,
      • run more than mapred.tasktracker.map.tasks.maximum mappers per node,
      • run more than mapred.tasktracker.reduce.tasks.maximum reducers per node.

        Activity

        Hide
        Doug Cutting added a comment -

        Note this differs from the former semantics of mapred.tasktracker.tasks.maximum. Before it was both the total number of map tasks and the total number of reduce tasks, for example, if it was 4, then there could be up to 4 map tasks and up to 4 reduce tasks, for a total of up to 8 tasks per node.

        Also note that, under your proposal, a configuration where mapred.tasktracker.tasks.maximum is not greater than mapred.tasktracker.reduce.tasks.maximum can lead to deadlock. If every slot is filled performing a reduce, and a node fails, triggering re-execution of its maps, but no map slots are available, then, currently, the system will not kill a reduce task, but rather all the reduce tasks will patiently wait forever.

        Show
        Doug Cutting added a comment - Note this differs from the former semantics of mapred.tasktracker.tasks.maximum. Before it was both the total number of map tasks and the total number of reduce tasks, for example, if it was 4, then there could be up to 4 map tasks and up to 4 reduce tasks, for a total of up to 8 tasks per node. Also note that, under your proposal, a configuration where mapred.tasktracker.tasks.maximum is not greater than mapred.tasktracker.reduce.tasks.maximum can lead to deadlock. If every slot is filled performing a reduce, and a node fails, triggering re-execution of its maps, but no map slots are available, then, currently, the system will not kill a reduce task, but rather all the reduce tasks will patiently wait forever.
        Hide
        Iván de Prado added a comment -

        I understand. So the solution is not so easy. The problem I see with the current configuration schema arises for clusters that usually execute jobs in sequence, but jobs in parallel are executed some times. Let's suppose you have nodes with N CPUs and you can execute at most N tasks per node with the available memory. You have to configure N/2 max maps and N/2 max reduces per node if you want to be able to execute some jobs in parallel. But the cluster will take advantage of only half of the resources when executing sequential jobs.

        Is it possible to have a configuration schema that allows to use all resources for sequential jobs but not more than available resources when parallel job executions?

        Does it make sense to have a mapred.tasktracker.tasks.maximum that limits the maximun total number of tasks per node, but forcing mapred.tasktracker.reduce.tasks.maximum to be smaller than mapred.tasktracker.tasks.maximum for skip the possible deadlock?

        Thanks for your amazing OS project.

        Show
        Iván de Prado added a comment - I understand. So the solution is not so easy. The problem I see with the current configuration schema arises for clusters that usually execute jobs in sequence, but jobs in parallel are executed some times. Let's suppose you have nodes with N CPUs and you can execute at most N tasks per node with the available memory. You have to configure N/2 max maps and N/2 max reduces per node if you want to be able to execute some jobs in parallel. But the cluster will take advantage of only half of the resources when executing sequential jobs. Is it possible to have a configuration schema that allows to use all resources for sequential jobs but not more than available resources when parallel job executions? Does it make sense to have a mapred.tasktracker.tasks.maximum that limits the maximun total number of tasks per node, but forcing mapred.tasktracker.reduce.tasks.maximum to be smaller than mapred.tasktracker.tasks.maximum for skip the possible deadlock? Thanks for your amazing OS project.
        Hide
        Stefan Groschupf added a comment -

        Wow - that is still open. To me looks like if you have mixed usage cluster with seq job flows and sometimes multi tenant job load the hardware is always under or over utilized. There is no way to cleanly configure that.

        Shouldn't it be easy to put a check in to make sure the mapred.tasktracker.tasks.maximum is bigger than the others.

        Show
        Stefan Groschupf added a comment - Wow - that is still open. To me looks like if you have mixed usage cluster with seq job flows and sometimes multi tenant job load the hardware is always under or over utilized. There is no way to cleanly configure that. Shouldn't it be easy to put a check in to make sure the mapred.tasktracker.tasks.maximum is bigger than the others.
        Hide
        Iván de Prado added a comment -

        Seems too old and not very relevant now.

        Show
        Iván de Prado added a comment - Seems too old and not very relevant now.

          People

          • Assignee:
            Unassigned
            Reporter:
            Iván de Prado
          • Votes:
            2 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development