Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4204

Refactor ProcfsBasedProcessTree to make the resource collection object pluggable

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Making it a pluggable interface will allow replacing the procfs based implementation with ones for other platforms.

      1. MAPREDUCE-4204-1.patch
        21 kB
        Bikas Saha
      2. MAPREDUCE-4204.patch
        21 kB
        Bikas Saha

        Issue Links

          Activity

          Hide
          Bikas Saha added a comment -

          Refactor patch.
          Replaces ProcfsBasedProcessTree with an abstract class ResourceCalculatorProcessTree.

          Show
          Bikas Saha added a comment - Refactor patch. Replaces ProcfsBasedProcessTree with an abstract class ResourceCalculatorProcessTree.
          Hide
          Radim Kolar added a comment -

          ResourceCalculatorProcessTree should be merged with existing ResourceCalculatorPlugin infractructure.

          It means:

          1. remove ResourceCalculatorProcessTree getResourceCalculatorProcessTree and move it to ResourceCalculatorPlugin as abstract member method.

          2. LinuxResourceCalculatorPlugin will return procfsBasedProcessTree

          3. method isAvailable() should not be needed, it is checked at lot of places. Removing it will simplify code. Unsupported operation systems get null from getResourceCalculatorPlugin and no plugin means no ResourceCalculatorProcessTree as well.

          4. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMemoryMonitorImpl must be modified to work with ResourceCalculatorPlugin

          Show
          Radim Kolar added a comment - ResourceCalculatorProcessTree should be merged with existing ResourceCalculatorPlugin infractructure. It means: 1. remove ResourceCalculatorProcessTree getResourceCalculatorProcessTree and move it to ResourceCalculatorPlugin as abstract member method. 2. LinuxResourceCalculatorPlugin will return procfsBasedProcessTree 3. method isAvailable() should not be needed, it is checked at lot of places. Removing it will simplify code. Unsupported operation systems get null from getResourceCalculatorPlugin and no plugin means no ResourceCalculatorProcessTree as well. 4. org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMemoryMonitorImpl must be modified to work with ResourceCalculatorPlugin
          Hide
          Bikas Saha added a comment -

          I am not quite sure about replacing ResourceCalculatorProcessTree with ResourceCalculatorPlugin. Its not clear that they are one and the same. Each seems to have different use cases in the code. If you feel strongly about it then please open a separate jira to merge them and let the committers give guidance on that. Lets not merge 2 steps of refactoring in 1 jira.

          isAvailable() is needed because this functionality is not available for every OS but MapReduce could be used without this on them. What I could do is assert in the factory getter that isAvailable is true with a doc note telling developers to always check isAvailable before trying to create the object. There is an additional isAvailable on ProcfsBasedProcessTree that I dont like but I have to leave it there because its TestProcfsBasedProcessTree checks for that before running - ie it checks if ProcfsBasedProcessTree is supported before trying to test it.

          I cannot change anything around ContainerMemory* because this branch is derived from Hadoop 1.0 for windows port work.

          A key criteria behind the changes in this branch is to make merging back into mainline as smooth as possible. So we dont want to change the class hierarchy and names too much.

          Show
          Bikas Saha added a comment - I am not quite sure about replacing ResourceCalculatorProcessTree with ResourceCalculatorPlugin. Its not clear that they are one and the same. Each seems to have different use cases in the code. If you feel strongly about it then please open a separate jira to merge them and let the committers give guidance on that. Lets not merge 2 steps of refactoring in 1 jira. isAvailable() is needed because this functionality is not available for every OS but MapReduce could be used without this on them. What I could do is assert in the factory getter that isAvailable is true with a doc note telling developers to always check isAvailable before trying to create the object. There is an additional isAvailable on ProcfsBasedProcessTree that I dont like but I have to leave it there because its TestProcfsBasedProcessTree checks for that before running - ie it checks if ProcfsBasedProcessTree is supported before trying to test it. I cannot change anything around ContainerMemory* because this branch is derived from Hadoop 1.0 for windows port work. A key criteria behind the changes in this branch is to make merging back into mainline as smooth as possible. So we dont want to change the class hierarchy and names too much.
          Hide
          Bikas Saha added a comment -

          Tiny diff from previous one to remove unneeded methods in ProcfsBasedProcessTree because they are already defined in the parent class and were being needlessly overridden in the derived class.

          Show
          Bikas Saha added a comment - Tiny diff from previous one to remove unneeded methods in ProcfsBasedProcessTree because they are already defined in the parent class and were being needlessly overridden in the derived class.
          Hide
          Sanjay Radia added a comment -

          Thanks Bikas - Committed to branch-1-win

          Show
          Sanjay Radia added a comment - Thanks Bikas - Committed to branch-1-win
          Hide
          Radim Kolar added a comment -

          Are there any plans for trunk version?

          Show
          Radim Kolar added a comment - Are there any plans for trunk version?
          Hide
          Bikas Saha added a comment -

          The plan is to eventually merge this back to 1.0 and trunk.

          Show
          Bikas Saha added a comment - The plan is to eventually merge this back to 1.0 and trunk.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          org.apache.hadoop.mapreduce.util.ProcessTree was put in exactly for this purpose. Why have a parallel ResourceCalculatorProcessTree abstraction?

          Show
          Vinod Kumar Vavilapalli added a comment - org.apache.hadoop.mapreduce.util.ProcessTree was put in exactly for this purpose. Why have a parallel ResourceCalculatorProcessTree abstraction?
          Hide
          Radim Kolar added a comment -

          org.apache.hadoop.mapreduce.util.ProcessTree is used for different purpose.

          Show
          Radim Kolar added a comment - org.apache.hadoop.mapreduce.util.ProcessTree is used for different purpose.
          Hide
          Radim Kolar added a comment -

          Are you working on trunk version?

          Show
          Radim Kolar added a comment - Are you working on trunk version?
          Hide
          Bikas Saha added a comment -

          Trunk versions will be prepared once this branch is stabilized. If you need this urgently, then feel free to port this jira to trunk.

          Show
          Bikas Saha added a comment - Trunk versions will be prepared once this branch is stabilized. If you need this urgently, then feel free to port this jira to trunk.
          Hide
          Radim Kolar added a comment -

          i am working on it. Porting it to YARN/trunk. There is also mapreduce/trunk which has some code Copy & Paste duplicated with YARN, but i am leaving mapreduce alone.

          Show
          Radim Kolar added a comment - i am working on it. Porting it to YARN/trunk. There is also mapreduce/trunk which has some code Copy & Paste duplicated with YARN, but i am leaving mapreduce alone.
          Hide
          Radim Kolar added a comment -

          Trunk version is in MAPREDUCE-4275

          Show
          Radim Kolar added a comment - Trunk version is in MAPREDUCE-4275

            People

            • Assignee:
              Bikas Saha
              Reporter:
              Bikas Saha
            • Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development