Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1181

Enforce RSS memory limit in TaskMemoryManagerThread

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.20.1
    • Component/s: tasktracker
    • Labels:
      None

      Description

      TaskMemoryManagerThread will periodically check the rss memory usage of every task. If the memory usage exceeds the specified threshold, the task will be killed. Also if the total rss memory of all tasks exceeds (total amount of memory - specified reserved memory). The task with least progress will be killed to recover the reserved rss memory.

      This is similar to the virtual memory limit provided by TaskMemoryManagerThread. But now the limit is for rss memory. This new feature allow us to avoid page swapping which is prone to error.

      The following are the related configurations
      mapreduce.reduce.memory.rss.mb // RSS memory allowed for a reduce task
      mapreduce.map.memory.rss.mb // RSS memory allowed for a map task
      mapreduce.tasktracker.reserved.memory.rss.mb // RSS memory reserved (not for tasks) on a tasktracker

        Issue Links

          Activity

          Hide
          Scott Chen added a comment -

          Uses the RSS memory gauged by ProcfsBasedProcessTree provided by MAPREDUCE-1167

          Show
          Scott Chen added a comment - Uses the RSS memory gauged by ProcfsBasedProcessTree provided by MAPREDUCE-1167
          Hide
          Vinod Kumar Vavilapalli added a comment -

          This new feature allow us to avoid page swapping which is prone to error.

          Can you elaborate on this? RSS unlike vmem is a very dynamic entity for a process, and depends not just on this process but others too. So I am not sure if trying to shoot down tasks based on their memory usage will work well.

          This new feature allow us to avoid page swapping which is prone to error.

          Explain this too?

          The original intention why the feature of killing tasks via the TaskMemoryManager was added was to prevent nodes from going down. If tasks use too much virtual memory (rss AND swap), OS will not have any way of recovering itself. And we have seen instances of this where nodes go down completely because of this.

          On the other hand, I am not too sure too much rss usage results in similar effects. Did you see such drastic instances? If not and if you are concerned about thrashing only, then a better way of controlling this may be to not even schedule tasks if total rss usage is to the brim. Thoughts?

          Show
          Vinod Kumar Vavilapalli added a comment - This new feature allow us to avoid page swapping which is prone to error. Can you elaborate on this? RSS unlike vmem is a very dynamic entity for a process, and depends not just on this process but others too. So I am not sure if trying to shoot down tasks based on their memory usage will work well. This new feature allow us to avoid page swapping which is prone to error. Explain this too? The original intention why the feature of killing tasks via the TaskMemoryManager was added was to prevent nodes from going down. If tasks use too much virtual memory (rss AND swap), OS will not have any way of recovering itself. And we have seen instances of this where nodes go down completely because of this. On the other hand, I am not too sure too much rss usage results in similar effects. Did you see such drastic instances? If not and if you are concerned about thrashing only, then a better way of controlling this may be to not even schedule tasks if total rss usage is to the brim. Thoughts?
          Hide
          Scott Chen added a comment -

          Hi Vinod,

          Thanks for the comment.

          After investigating this and doing some experiments for the past few days. I agree with you. It is more reliable to monitor tasks using virtual memory than using physical memory because virtual memory is not as dynamic as RSS and we can stop the task before the physical memory goes high.

          And I also agree with your second point. It is better to use the total RSS usage in scheduling rather than here.

          Now I think this feature is not necessary. But it is still good to keep the feature that allows the ProcfsBasedProcessTree collects RSS. It can be used for job profiling later. I will continue working on that one. My plan is to make ProcfsBaedProcessTree collect RSS usage and number of CPU jiffies for all tasks and submit through heartbeat by TaskTrackerStatus.taskReports.

          Show
          Scott Chen added a comment - Hi Vinod, Thanks for the comment. After investigating this and doing some experiments for the past few days. I agree with you. It is more reliable to monitor tasks using virtual memory than using physical memory because virtual memory is not as dynamic as RSS and we can stop the task before the physical memory goes high. And I also agree with your second point. It is better to use the total RSS usage in scheduling rather than here. Now I think this feature is not necessary. But it is still good to keep the feature that allows the ProcfsBasedProcessTree collects RSS. It can be used for job profiling later. I will continue working on that one. My plan is to make ProcfsBaedProcessTree collect RSS usage and number of CPU jiffies for all tasks and submit through heartbeat by TaskTrackerStatus.taskReports.

            People

            • Assignee:
              Unassigned
              Reporter:
              Scott Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development