Hadoop Common
  1. Hadoop Common
  2. HADOOP-4931

Document TaskTracker's memory management functionality and CapacityScheduler's memory based scheduling.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 0.20.1
    • Component/s: None
    • Labels:
      None
    1. 4931.1.patch
      20 kB
      Vivek Ratan
    2. 4931.2.patch
      14 kB
      Sreekanth Ramakrishnan

      Issue Links

        Activity

        Hide
        Vivek Ratan added a comment -

        Attaching patch (4931.1.patch) with relevant documentation. I also updated capacity_scheduler.xml to be consistent with the usage of 'Scheduler' (upper case 'S'), and 'Capacity Scheduler'.

        Show
        Vivek Ratan added a comment - Attaching patch (4931.1.patch) with relevant documentation. I also updated capacity_scheduler.xml to be consistent with the usage of 'Scheduler' (upper case 'S'), and 'Capacity Scheduler'.
        Hide
        Hemanth Yamijala added a comment -

        Vivek, few comments:

        • I think monitoring should be with cluster setup rather than with the map/reduce tutorial. This is because it is more of an admin feature. The user parameters for memory should be as you have defined in the map/reduce tutorial, and possibly we can link the monitoring section in the cluster setup guide for more details.
        • "Users can, optionally, indicate the VM task-limit per job." I think we can use 'specify' instead of indicate, as it seems to describe things better.
        • "To enable monitoring for a (TT)" should be "To enable monitoring for a TT"
        • Some of the documented variable names and types need to be checked. For e.g. it is not mapred.tasktracker.virtualmemory.reserved, but mapred.tasktracker.vmem.reserved. The type is long and not int. I think the value is in bytes and not KB.
        • We should mention that both monitoring and scheduling is only supported on the Linux platform right now.
        • Can we add a note that in monitoring, when a task is killed, a message is logged so users can see it in the daemon logs.
        • Some of the parameters are cluster wide parameters. For e.g. default vmem. It would be best to call this out explicitly.
        • In the capacity scheduler documentation, I think limit and the percentage pmem in vmem are two separate configuration items where you are treating them as one.
        Show
        Hemanth Yamijala added a comment - Vivek, few comments: I think monitoring should be with cluster setup rather than with the map/reduce tutorial. This is because it is more of an admin feature. The user parameters for memory should be as you have defined in the map/reduce tutorial, and possibly we can link the monitoring section in the cluster setup guide for more details. "Users can, optionally, indicate the VM task-limit per job." I think we can use 'specify' instead of indicate, as it seems to describe things better. "To enable monitoring for a (TT)" should be "To enable monitoring for a TT" Some of the documented variable names and types need to be checked. For e.g. it is not mapred.tasktracker.virtualmemory.reserved, but mapred.tasktracker.vmem.reserved. The type is long and not int. I think the value is in bytes and not KB. We should mention that both monitoring and scheduling is only supported on the Linux platform right now. Can we add a note that in monitoring, when a task is killed, a message is logged so users can see it in the daemon logs. Some of the parameters are cluster wide parameters. For e.g. default vmem. It would be best to call this out explicitly. In the capacity scheduler documentation, I think limit and the percentage pmem in vmem are two separate configuration items where you are treating them as one.
        Hide
        Vivek Ratan added a comment -

        I think monitoring should be with cluster setup rather than with the map/reduce tutorial...

        I'm not sure about this, for the following reasons:

        • The cluster setup seems to be for setup required to make sure the cluster works correctly - basic, core settings. At least that's my take on it. The TT monitoring stuff is an optional feature, likely to be used only by power users.
        • Splitting the memory features across three guides (cluster setup, M/R tutorial, and Capacity Scheduler) seems excessive.
        • The MR tutorial already had a section on 'Memory management', so it seemed like a logical place to place our documentation.
        • You can argue that there are plenty of parameters described in the MR tutorial that are also 'admin features'.

        Are there plans to have another guide, something like 'Hadoop features', or 'MR features'? Maybe that's the right place for something like this: optional features in the system that affect how things work.

        Your other suggestions make sense. I'll incorporate them.

        Show
        Vivek Ratan added a comment - I think monitoring should be with cluster setup rather than with the map/reduce tutorial... I'm not sure about this, for the following reasons: The cluster setup seems to be for setup required to make sure the cluster works correctly - basic, core settings. At least that's my take on it. The TT monitoring stuff is an optional feature, likely to be used only by power users. Splitting the memory features across three guides (cluster setup, M/R tutorial, and Capacity Scheduler) seems excessive. The MR tutorial already had a section on 'Memory management', so it seemed like a logical place to place our documentation. You can argue that there are plenty of parameters described in the MR tutorial that are also 'admin features'. Are there plans to have another guide, something like 'Hadoop features', or 'MR features'? Maybe that's the right place for something like this: optional features in the system that affect how things work. Your other suggestions make sense. I'll incorporate them.
        Hide
        Hemanth Yamijala added a comment -

        The cluster setup seems to be for setup required to make sure the cluster works correctly - basic, core settings. At least that's my take on it. The TT monitoring stuff is an optional feature, likely to be used only by power users.

        The cluster setup guide reads:

        This document describes how to install, configure and manage non-trivial Hadoop clusters ranging from a few nodes to extremely large clusters with thousands of nodes.

        By non-trivial, I am assuming it is more than the basic settings. Look at the 'Real world Cluster Configurations' section. So, I think that it is the right place for advanced configuration.

        Splitting the memory features across three guides (cluster setup, M/R tutorial, and Capacity Scheduler) seems excessive.

        But we are documenting different things, the cluster admin is going to deep dive into details of configuring memory parameters, while skimming through how users can specify memory limits in their job conf. The user, reading map/reduce tutorial only needs to know how he can set up memory configuration for his job. Details of how memory management is configured on the TT, while useful, are not relevant for him.

        The MR tutorial already had a section on 'Memory management', so it seemed like a logical place to place our documentation.

        That place talks about the ulimit option and java vm child options which can be tweaked by users when submitting a job. So, it still makes sense in the map/reduce tutorial. Note that the ulimit option is also configured in the Cluster setup guide, where the admin can tweak it.

        You can argue that there are plenty of parameters described in the MR tutorial that are also 'admin features'.

        I wasn't able to find any. Can you please give me an example. All parameters mentioned in the MR tutorial seem to be ones which users can configure as part of job configuration.

        From the very nature of the guides, I think M/R tutorial is meant to help people wanting to write jobs and cluster setup is for people who want to configure clusters. The feature description should be split in that way. I am completely OK with describing how the feature works in one place rather than three and linking them from other places for completeness.

        Show
        Hemanth Yamijala added a comment - The cluster setup seems to be for setup required to make sure the cluster works correctly - basic, core settings. At least that's my take on it. The TT monitoring stuff is an optional feature, likely to be used only by power users. The cluster setup guide reads: This document describes how to install, configure and manage non-trivial Hadoop clusters ranging from a few nodes to extremely large clusters with thousands of nodes. By non-trivial, I am assuming it is more than the basic settings. Look at the 'Real world Cluster Configurations' section. So, I think that it is the right place for advanced configuration. Splitting the memory features across three guides (cluster setup, M/R tutorial, and Capacity Scheduler) seems excessive. But we are documenting different things, the cluster admin is going to deep dive into details of configuring memory parameters, while skimming through how users can specify memory limits in their job conf. The user, reading map/reduce tutorial only needs to know how he can set up memory configuration for his job. Details of how memory management is configured on the TT, while useful, are not relevant for him. The MR tutorial already had a section on 'Memory management', so it seemed like a logical place to place our documentation. That place talks about the ulimit option and java vm child options which can be tweaked by users when submitting a job. So, it still makes sense in the map/reduce tutorial. Note that the ulimit option is also configured in the Cluster setup guide, where the admin can tweak it. You can argue that there are plenty of parameters described in the MR tutorial that are also 'admin features'. I wasn't able to find any. Can you please give me an example. All parameters mentioned in the MR tutorial seem to be ones which users can configure as part of job configuration. From the very nature of the guides, I think M/R tutorial is meant to help people wanting to write jobs and cluster setup is for people who want to configure clusters. The feature description should be split in that way. I am completely OK with describing how the feature works in one place rather than three and linking them from other places for completeness.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Some comments:

        • The words 'absence' and 'set' are being used frequently in relation to configuration parameters. We can define upfront that a -1 value indicates disabling/missing/unset and that any other +ve value indicates set/enabling.
        • The purpose of offset/reserved memory is never being stated. We should state that such reserved memory/offset is for system usage like for OS, system and hadoop daemons themselves, so that it's clear for the admins.
        • All the memory values are in bytes. Sorry Vivek for misinforming you earlier.

        mapred_tutorial.xml

        • VM is being used to stand for virtual machine at one other place in mapred_tutorial. May be we should use VMEM and PMEM to be clear.
        • I think pmem related parameters should not be discussed about in monitoring section. They can be in a separate section, say scheduling related configuration.
        • Hemanth> Can we add a note that in monitoring, when a task is killed, a message is logged so users can see it in the daemon logs.
          We also give this information, as to why the task is killed, in the tasks' diagnostic messages. We can say that in the documentation.
        • As for the overall documentation's organization, I too feel that we should separate cluster setup related information from user parameters.

        capacity_scheduler.xml

        • May be we can give an example of how scheduling based on memory is done, citing real numbers and memory values. But I don't know for sure as the expected audience of this document doesn't look very clear to me.
        • That brings me to another point. May be we should separate capacity_scheduler.xml into different guides, or in the minimum different sections - for administrators, for users and a general configuration glossary - in the same vein as HOD's guides. Another point is that the scheduling steps are being described in good detail here. Only some part of it is actually needed/useful for the users. Once we have different guides/sections we can organize the documentation for memory based scheduling properly. Thoughts?
        Show
        Vinod Kumar Vavilapalli added a comment - Some comments: The words 'absence' and 'set' are being used frequently in relation to configuration parameters. We can define upfront that a -1 value indicates disabling/missing/unset and that any other +ve value indicates set/enabling. The purpose of offset/reserved memory is never being stated. We should state that such reserved memory/offset is for system usage like for OS, system and hadoop daemons themselves, so that it's clear for the admins. All the memory values are in bytes. Sorry Vivek for misinforming you earlier. mapred_tutorial.xml VM is being used to stand for virtual machine at one other place in mapred_tutorial. May be we should use VMEM and PMEM to be clear. I think pmem related parameters should not be discussed about in monitoring section. They can be in a separate section, say scheduling related configuration. Hemanth> Can we add a note that in monitoring, when a task is killed, a message is logged so users can see it in the daemon logs. We also give this information, as to why the task is killed, in the tasks' diagnostic messages. We can say that in the documentation. As for the overall documentation's organization, I too feel that we should separate cluster setup related information from user parameters. capacity_scheduler.xml May be we can give an example of how scheduling based on memory is done, citing real numbers and memory values. But I don't know for sure as the expected audience of this document doesn't look very clear to me. That brings me to another point. May be we should separate capacity_scheduler.xml into different guides, or in the minimum different sections - for administrators, for users and a general configuration glossary - in the same vein as HOD's guides. Another point is that the scheduling steps are being described in good detail here. Only some part of it is actually needed/useful for the users. Once we have different guides/sections we can organize the documentation for memory based scheduling properly. Thoughts?
        Hide
        Hemanth Yamijala added a comment -

        Vinod, maybe it is not time yet to split the documentation - IMO. It's not that big. The case for HOD was different - there was a LOT to document that each guide would have been somewhat unmanageable if we plugged it all together.

        And I also think Vivek's description of how it works will be really useful to understand what's happening. Users would be happy with it, and they can skim it if required. So, I would stay with Vivek's detsils.

        Show
        Hemanth Yamijala added a comment - Vinod, maybe it is not time yet to split the documentation - IMO. It's not that big. The case for HOD was different - there was a LOT to document that each guide would have been somewhat unmanageable if we plugged it all together. And I also think Vivek's description of how it works will be really useful to understand what's happening. Users would be happy with it, and they can skim it if required. So, I would stay with Vivek's detsils.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch incorporating comments.

        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch incorporating comments.
        Hide
        Hemanth Yamijala added a comment -

        This was resolved in HADOOP-5736.

        Show
        Hemanth Yamijala added a comment - This was resolved in HADOOP-5736 .

          People

          • Assignee:
            Sreekanth Ramakrishnan
            Reporter:
            Vinod Kumar Vavilapalli
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development