Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3902

MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: applicationmaster, mrv2
    • Labels:
      None

      Description

      The MR AM is now in a great position to reuse containers across (map) tasks. This is something similar to JVM re-use we had in 0.20.x, but in a significantly better manner:

      1. Consider data-locality when re-using containers
      2. Consider the new shuffle - ensure that reduces fetch output of the whole container at once (i.e. all maps) : MAPREDUCE-4525
      1. MAPREDUCE-3902.patch
        74 kB
        Arun C Murthy
      2. MAPREDUCE-3902.2.patch
        55 kB
        Tsuyoshi Ozawa
      3. AM_ContainerRefactor.pdf
        151 kB
        Siddharth Seth
      4. AMContainerRefactorNotes.pdf
        58 kB
        Siddharth Seth

        Issue Links

        1.
        [MAPREDUCE-3902] TaskHeartbeatHandler should extends HeartbeatHandlerBase Sub-task Resolved Tsuyoshi Ozawa
         
        2. [MAPREDUCE-3902] ScheduledRequests#remove should remove the elements from mapsHostMapping and mapsRackMapping Sub-task Open Tsuyoshi Ozawa
         
        3.
        [MAPREDUCE-3902] Ensure not to launch container on blacklisted hosts Sub-task Resolved Tsuyoshi Ozawa
         
        4.
        [MAPREDUCE-3902] Re-create ask list correctly in case of a temporary error in the AM-RM allocate call Sub-task Resolved Siddharth Seth
         
        5.
        [MAPREDUCE-3902] RMContainerAllocator#scheduleInterval should be configurable Sub-task Resolved Tsuyoshi Ozawa
         
        6. [MAPREDUCE-3902] RMContainerAllocator#assign should be split into functions Sub-task Open Tsuyoshi Ozawa
         
        7.
        [MAPREDUCE-3902] Re-wire AM Recovery Sub-task Resolved Siddharth Seth
         
        8.
        [MAPREDUCE-3902] Re-wire LocalContainerAllocator / UberAM Sub-task Resolved Siddharth Seth
         
        9.
        [MAPREDUCE-3902] Change AMContainerMap to extend AbstractService Sub-task Resolved Tsuyoshi Ozawa
         
        10.
        [MAPREDUCE-3902] RMContainerAllocator should factor in nodes being blacklisted Sub-task Resolved Siddharth Seth
         
        11.
        [MAPREDUCE-3902] Disable AM blacklisting if #blacklistedNodes crosses the configured threshold Sub-task Resolved Siddharth Seth
         
        12. [MAPREDUCE-3902] Unit tests for AMContainer Sub-task Open Unassigned
         
        13. [MAPREDUCE-3902] Unit tests for AMNode Sub-task Open Unassigned
         
        14.
        [MAPREDUCE-3902] Reduce scheduling fixes, factor in MR-4437 Sub-task Resolved Siddharth Seth
         
        15.
        [MAPREDUCE-3902] Statistics logging in the AM scheduler Sub-task Resolved Siddharth Seth
         
        16.
        [MAPREDUCE-3902] Fix and re-enable RMContaienrAllocator unit tests Sub-task Resolved Siddharth Seth
         
        17. [MAPREDUCE-3902] Handle the JobFinishedEvent correctly Sub-task Open Siddharth Seth
         
        18.
        [MAPREDUCE-3902] Container Launch should be independent of o.a.h.m.Task Sub-task Resolved Siddharth Seth
         
        19.
        [MAPREDUCE-3902] ContainerHeartbeatHandler should be pinged on a getTask call Sub-task Resolved Siddharth Seth
         
        20.
        [MAPREDUCE-3902] Use the configured shuffle port and application ACLs Sub-task Resolved Siddharth Seth
         
        21.
        [MAPREDUCE-3902] Handle a successful NM stop request Sub-task Resolved Siddharth Seth
         
        22. [MAPREDUCE-3902] re-enable disabled unit tests in mr-client-app2 module Sub-task Open Siddharth Seth
         

          Activity

          Hide
          Rohith Sharma K S added a comment -

          I will update earliest

          Show
          Rohith Sharma K S added a comment - I will update earliest
          Kannan Rajah made changes -
          Assignee Siddharth Seth [ sseth ] Kannan Rajah [ rkannan82 ]
          Hide
          Kannan Rajah added a comment -

          OK. I am going to start looking at the code changes given in this patch to understand the workflow better. I don't see a design doc per se. If there is one, that will be ideal. Let's wait for Rohith to get back on his design spec also.

          Show
          Kannan Rajah added a comment - OK. I am going to start looking at the code changes given in this patch to understand the workflow better. I don't see a design doc per se. If there is one, that will be ideal. Let's wait for Rohith to get back on his design spec also.
          Hide
          Tsuyoshi Ozawa added a comment -

          I think this JIRA has become stale naturally. I can help you if you're planning to do this.

          Rohith Sharma K S, Is your design doc different from Sid's one? Maybe we need to deal with AM restart.

          cc: Siddharth Seth You have stopped this work since did you find any design issue or something? If the answer is positive, could you tell us the information?

          Show
          Tsuyoshi Ozawa added a comment - I think this JIRA has become stale naturally. I can help you if you're planning to do this. Rohith Sharma K S , Is your design doc different from Sid's one? Maybe we need to deal with AM restart. cc: Siddharth Seth You have stopped this work since did you find any design issue or something? If the answer is positive, could you tell us the information?
          Hide
          Kannan Rajah added a comment -

          Rohith Sharma K S Can you share the design spec and patch?

          Show
          Kannan Rajah added a comment - Rohith Sharma K S Can you share the design spec and patch?
          Hide
          Rohith Sharma K S added a comment -

          I wonder this jira has become stale for long time and would like to know the reason. I personally think this feature would be helpfull in terms of latency container allocation latency.We have done few analysis and implemented support for JVM reuse on branch-2 without breaking existing AM functionality. We would be ready to share prototype patch along with design doc.

          Show
          Rohith Sharma K S added a comment - I wonder this jira has become stale for long time and would like to know the reason. I personally think this feature would be helpfull in terms of latency container allocation latency.We have done few analysis and implemented support for JVM reuse on branch-2 without breaking existing AM functionality. We would be ready to share prototype patch along with design doc.
          Hide
          Kannan Rajah added a comment -

          Thanks Tsuyoshi Ozawa. Do you know why this work had stopped since Sep 2012? I want to understand if there is any other design already in progress to address this problem. If so, I would like to contribute to it. For e.g., there was a post by [~seth.siddharth@gmail.com] on Tez in 2013 and how it tries to solve the container reuse problem. http://hortonworks.com/blog/re-using-containers-in-apache-tez/.

          Show
          Kannan Rajah added a comment - Thanks Tsuyoshi Ozawa . Do you know why this work had stopped since Sep 2012? I want to understand if there is any other design already in progress to address this problem. If so, I would like to contribute to it. For e.g., there was a post by [~seth.siddharth@gmail.com] on Tez in 2013 and how it tries to solve the container reuse problem. http://hortonworks.com/blog/re-using-containers-in-apache-tez/ .
          Hide
          Tsuyoshi Ozawa added a comment -

          Kannan Rajah thanks for pinging me. I think the works in this ticket was being done in branch(MAPREDUCE-3902). It is based on the old trunk code and now difficult to rebase it. Siddharth Seth, what do you think?

          Show
          Tsuyoshi Ozawa added a comment - Kannan Rajah thanks for pinging me. I think the works in this ticket was being done in branch( MAPREDUCE-3902 ). It is based on the old trunk code and now difficult to rebase it. Siddharth Seth , what do you think?
          Hide
          Kannan Rajah added a comment -

          Tsuyoshi Ozawa Would like to check with you also since you are the assignee for some child JIRAs.

          Show
          Kannan Rajah added a comment - Tsuyoshi Ozawa Would like to check with you also since you are the assignee for some child JIRAs.
          Hide
          Kannan Rajah added a comment -

          Siddharth Seth I am going to look at JVM reuse in YARN. I came across this JIRA and see there has not been any update in a long time. Can you please provide an update?

          Show
          Kannan Rajah added a comment - Siddharth Seth I am going to look at JVM reuse in YARN. I came across this JIRA and see there has not been any update in a long time. Can you please provide an update?
          Siddharth Seth made changes -
          Attachment AM_ContainerRefactor.pdf [ 12546255 ]
          Attachment AMContainerRefactorNotes.pdf [ 12546256 ]
          Hide
          Siddharth Seth added a comment -

          Modified state machines - with information on actions to be taken when an event occurs at a particular state. Also some additional notes.
          These are from a while ago, and the code has deviated to some extent from these tables (especially Node), and will deviate some more. However, even in the current state, this is a fair representation of event flow, and should make walking through the code easier.

          Show
          Siddharth Seth added a comment - Modified state machines - with information on actions to be taken when an event occurs at a particular state. Also some additional notes. These are from a while ago, and the code has deviated to some extent from these tables (especially Node), and will deviate some more. However, even in the current state, this is a fair representation of event flow, and should make walking through the code easier.
          Tsuyoshi Ozawa made changes -
          Link This issue blocks MAPREDUCE-4502 [ MAPREDUCE-4502 ]
          Hide
          Siddharth Seth added a comment -

          Thanks for the help with this JIRA.

          because MRAppMaster in container-reuse implementation has the feature to monitor whether the running tasks on the containers are "the last task at a machine or not", for the purpose of exiting JVMs on containers, as you know.

          That will definitely be simpler to achieve with the container-reuse AM, with nodes already tracking container information. Last task on a node can be figured out relatively easily by the scheduler. It is, however, also possible with the current AM, and several bits like the decision on when to run the combiner - should be a straight forward port to the reuse-AM. IAC, it'll be good to get the re-use AM into trunk fast. Looking forward to the updates on 4502 and 4525.

          Show
          Siddharth Seth added a comment - Thanks for the help with this JIRA. because MRAppMaster in container-reuse implementation has the feature to monitor whether the running tasks on the containers are "the last task at a machine or not", for the purpose of exiting JVMs on containers, as you know. That will definitely be simpler to achieve with the container-reuse AM, with nodes already tracking container information. Last task on a node can be figured out relatively easily by the scheduler. It is, however, also possible with the current AM, and several bits like the decision on when to run the combiner - should be a straight forward port to the reuse-AM. IAC, it'll be good to get the re-use AM into trunk fast. Looking forward to the updates on 4502 and 4525.
          Hide
          Tsuyoshi Ozawa added a comment -

          Thanks for your enumerating remaining tasks, Siddharth. I'll support you as far as possible.

          And I've not yet explained you the relationship between container-reuse work and MAPREDUCE-4502, so it may confuse you. I'm sorry for the short of explanation. I'll give it to you briefly. I'm planning to implement MAPREDUCE-4502 and MAPREDUCE-4525 with container-reuse implementation, because MRAppMaster in container-reuse implementation has the feature to monitor whether the running tasks on the containers are "the last task at a machine or not", for the purpose of exiting JVMs on containers, as you know. This feature is very similar to monitor task progress per containers, for the purpose of starting to run combiner for multi-level aggregation (MAPREDUCE-4502 and MAPREDUCE-4525).

          The description here is not documented, so I'll write down my thought as the design note for MAPREDUCE-4502 and MAPREDUCE-4525 within next one week. I'm very appreciate if you review it.

          Thanks,
          Tsuyoshi

          Show
          Tsuyoshi Ozawa added a comment - Thanks for your enumerating remaining tasks, Siddharth. I'll support you as far as possible. And I've not yet explained you the relationship between container-reuse work and MAPREDUCE-4502 , so it may confuse you. I'm sorry for the short of explanation. I'll give it to you briefly. I'm planning to implement MAPREDUCE-4502 and MAPREDUCE-4525 with container-reuse implementation, because MRAppMaster in container-reuse implementation has the feature to monitor whether the running tasks on the containers are "the last task at a machine or not", for the purpose of exiting JVMs on containers, as you know. This feature is very similar to monitor task progress per containers, for the purpose of starting to run combiner for multi-level aggregation ( MAPREDUCE-4502 and MAPREDUCE-4525 ). The description here is not documented, so I'll write down my thought as the design note for MAPREDUCE-4502 and MAPREDUCE-4525 within next one week. I'm very appreciate if you review it. Thanks, Tsuyoshi
          Siddharth Seth made changes -
          Link This issue is blocked by YARN-75 [ YARN-75 ]
          Hide
          Siddharth Seth added a comment -

          You're right. There's a lot of patches which will need to go in. Creating some of the sub-tasks that will be required before this can be considered for a merge back to trunk. I believe this will take several more weeks. MAPREDUCE-4502 doesn't necessarily need to be blocked on this - if that's something you're waiting to work on.

          Show
          Siddharth Seth added a comment - You're right. There's a lot of patches which will need to go in. Creating some of the sub-tasks that will be required before this can be considered for a merge back to trunk. I believe this will take several more weeks. MAPREDUCE-4502 doesn't necessarily need to be blocked on this - if that's something you're waiting to work on.
          Hide
          Tsuyoshi Ozawa added a comment -

          Siddharth Seth,

          I think that it is necessary to create lots of patches for dealing with this ticket. If you have any opinion about how to advance this ticket, please let me know. My concern is your review cost and whether my priority set is correct or not.

          Show
          Tsuyoshi Ozawa added a comment - Siddharth Seth , I think that it is necessary to create lots of patches for dealing with this ticket. If you have any opinion about how to advance this ticket, please let me know. My concern is your review cost and whether my priority set is correct or not.
          Tsuyoshi Ozawa made changes -
          Link This issue relates to MAPREDUCE-4596 [ MAPREDUCE-4596 ]
          Hide
          Tsuyoshi Ozawa added a comment -

          I think a pull request against github for now, and for bigger / more significant changes - a separate subtasks under this jira for the changes.

          Okey.

          I'd like to create a separate branch for this jira, pull in the current set of changes with some cleanup, and then continue development. Will create a branch later this week unless if noone objects.

          All right, I agree with the idea to create new branch for this jira. It's much easier to trace the changes.

          And, I sent pull request (at github)https://github.com/sidseth/h2-container-reuse/pull/1. Please check it out.

          Show
          Tsuyoshi Ozawa added a comment - I think a pull request against github for now, and for bigger / more significant changes - a separate subtasks under this jira for the changes. Okey. I'd like to create a separate branch for this jira, pull in the current set of changes with some cleanup, and then continue development. Will create a branch later this week unless if noone objects. All right, I agree with the idea to create new branch for this jira. It's much easier to trace the changes. And, I sent pull request (at github) https://github.com/sidseth/h2-container-reuse/pull/1 . Please check it out.
          Hide
          Siddharth Seth added a comment -

          If I create some patches(ex. fixing TODOs or something), should I send pull request against your github or attach patch here?

          I think a pull request against github for now, and for bigger / more significant changes - a separate subtasks under this jira for the changes.

          Do you think that it's needed to separate hadoop-mapreduce-client-app from hadoop-mapreduce-client-app2? Your prototype is under hadoop-mapreduce-client-app2 currently. This make it difficult to rebase your code on trunk.

          The intention was to be able to run the existing code as well as the modified code in the same install - with a simple config change to chose between the implementations. That makes side by side comparisons much easier. Once this implementation stabilizes, it can be moved back to mapreduce-client-app to replace the current implementation. Also, there's some pretty big changes to TaskAttempt, AM scheduling classes, etc - given this, I'm not sure how useful a merge from trunk would be. This will have some overhead though - of pulling in / factoring in jiras which have been fixed after the branch.

          With mapreduce-client-app2 being a separate module, development could continue in the main branches. However, given that this implementation is not stable, I'd like to create a separate branch for this jira, pull in the current set of changes with some cleanup, and then continue development. Will create a branch later this week unless if noone objects.

          Show
          Siddharth Seth added a comment - If I create some patches(ex. fixing TODOs or something), should I send pull request against your github or attach patch here? I think a pull request against github for now, and for bigger / more significant changes - a separate subtasks under this jira for the changes. Do you think that it's needed to separate hadoop-mapreduce-client-app from hadoop-mapreduce-client-app2? Your prototype is under hadoop-mapreduce-client-app2 currently. This make it difficult to rebase your code on trunk. The intention was to be able to run the existing code as well as the modified code in the same install - with a simple config change to chose between the implementations. That makes side by side comparisons much easier. Once this implementation stabilizes, it can be moved back to mapreduce-client-app to replace the current implementation. Also, there's some pretty big changes to TaskAttempt, AM scheduling classes, etc - given this, I'm not sure how useful a merge from trunk would be. This will have some overhead though - of pulling in / factoring in jiras which have been fixed after the branch. With mapreduce-client-app2 being a separate module, development could continue in the main branches. However, given that this implementation is not stable, I'd like to create a separate branch for this jira, pull in the current set of changes with some cleanup, and then continue development. Will create a branch later this week unless if noone objects.
          Hide
          Tsuyoshi Ozawa added a comment -

          @Siddharth,

          I have two questions, although my work is still in progress.

          1. If I create some patches(ex. fixing TODOs or something), should I send pull request against your github or attach patch here?
          2. Do you think that it's needed to separate hadoop-mapreduce-client-app from hadoop-mapreduce-client-app2? Your prototype is under hadoop-mapreduce-client-app2 currently. This make it difficult to rebase your code on trunk.
          Show
          Tsuyoshi Ozawa added a comment - @Siddharth, I have two questions, although my work is still in progress. If I create some patches(ex. fixing TODOs or something), should I send pull request against your github or attach patch here? Do you think that it's needed to separate hadoop-mapreduce-client-app from hadoop-mapreduce-client-app2? Your prototype is under hadoop-mapreduce-client-app2 currently. This make it difficult to rebase your code on trunk.
          Tsuyoshi Ozawa made changes -
          Description The MR AM is now in a great position to reuse containers across (map) tasks. This is something similar to JVM re-use we had in 0.20.x, but in a significantly better manner:
          # Consider data-locality when re-using containers
          # Consider the new shuffle - ensure that reduces fetch output of the whole container at once (i.e. all maps)
          The MR AM is now in a great position to reuse containers across (map) tasks. This is something similar to JVM re-use we had in 0.20.x, but in a significantly better manner:
          # Consider data-locality when re-using containers
          # Consider the new shuffle - ensure that reduces fetch output of the whole container at once (i.e. all maps) : MAPREDUCE-4525
          Hide
          Tsuyoshi Ozawa added a comment -

          @Siddharth,

          Thank you for your sharing the progress and the design you've thought. I'm going to fix TODOs of your code at github. If you have any ideas about the design, please write it down here.

          Show
          Tsuyoshi Ozawa added a comment - @Siddharth, Thank you for your sharing the progress and the design you've thought. I'm going to fix TODOs of your code at github. If you have any ideas about the design, please write it down here.
          Hide
          Tsuyoshi Ozawa added a comment -

          The topic about combiner per container is moved.

          Show
          Tsuyoshi Ozawa added a comment - The topic about combiner per container is moved.
          Tsuyoshi Ozawa made changes -
          Link This issue is related to MAPREDUCE-4525 [ MAPREDUCE-4525 ]
          Hide
          Siddharth Seth added a comment -

          @Tsuyoshi; I'd spoken with Vinod and others about this a while ago. Should have posted this earlier.. Adding the functionality to the AM in the current state is possible - but will further complicate some components which are already quite complicated - and tough to change.

          The TaskAttempt state machine is currently really a mix of TaskAttempt transitions as well as Container transitions. The RMContaienrAllocator is also dealing with more than it should - Nodes, Containers as well as scheduling.

          The idea was to split the functionality into a separate TaskAttempt, Container and Node state machine, along with reduced functionality in the scheduler (also decoupling the RM request and AM scheduling). This would make the code cleaner and make re-use (as well as other improvements like handling retired nodes) easier to implement.

          Had worked with Vinod on the state transitions, and have been working on the implementation in bits and pieces to see how feasible it is. The code is at https://github.com/sidseth/h2-container-reuse . It's a little bit of a mess at the moment, with lots of TODOs, etc splattered all over, but is just about functional. There's no explicit re-use scheduling yet - but re-use can be tested by running a job which requires more containers than available on the cluster (and some config changes).

          the 2nd topic(combining per container) should be moved, because the change seems to be too big.

          I believe this was, at least initially, meant to ensure that output from all taskAttempts in one container, would be fetched only once by a reducer (without a common combiner). Either way, that could be a separate jira.

          Show
          Siddharth Seth added a comment - @Tsuyoshi; I'd spoken with Vinod and others about this a while ago. Should have posted this earlier.. Adding the functionality to the AM in the current state is possible - but will further complicate some components which are already quite complicated - and tough to change. The TaskAttempt state machine is currently really a mix of TaskAttempt transitions as well as Container transitions. The RMContaienrAllocator is also dealing with more than it should - Nodes, Containers as well as scheduling. The idea was to split the functionality into a separate TaskAttempt, Container and Node state machine, along with reduced functionality in the scheduler (also decoupling the RM request and AM scheduling). This would make the code cleaner and make re-use (as well as other improvements like handling retired nodes) easier to implement. Had worked with Vinod on the state transitions, and have been working on the implementation in bits and pieces to see how feasible it is. The code is at https://github.com/sidseth/h2-container-reuse . It's a little bit of a mess at the moment, with lots of TODOs, etc splattered all over, but is just about functional. There's no explicit re-use scheduling yet - but re-use can be tested by running a job which requires more containers than available on the cluster (and some config changes). the 2nd topic(combining per container) should be moved, because the change seems to be too big. I believe this was, at least initially, meant to ensure that output from all taskAttempts in one container, would be fetched only once by a reducer (without a common combiner). Either way, that could be a separate jira.
          Hide
          Tsuyoshi Ozawa added a comment -

          s/should be moved/should be moved to the new ticket/

          Show
          Tsuyoshi Ozawa added a comment - s/should be moved/should be moved to the new ticket/
          Hide
          Tsuyoshi Ozawa added a comment -

          IMHO, the 2nd topic(combining per container) should be moved, because the change seems to be too big.
          If there are no counter opinion, I'm going to create new ticket to deal with the 2nd topic as a sub-task of MAPREDUCe-3902.

          Show
          Tsuyoshi Ozawa added a comment - IMHO, the 2nd topic(combining per container) should be moved, because the change seems to be too big. If there are no counter opinion, I'm going to create new ticket to deal with the 2nd topic as a sub-task of MAPREDUCe-3902.
          Tsuyoshi Ozawa made changes -
          Link This issue blocks MAPREDUCE-4502 [ MAPREDUCE-4502 ]
          Tsuyoshi Ozawa made changes -
          Attachment MAPREDUCE-3902.2.patch [ 12539060 ]
          Hide
          Tsuyoshi Ozawa added a comment -

          As a first step, I fixed the patch by Arun to pass compile against current source code.

          Show
          Tsuyoshi Ozawa added a comment - As a first step, I fixed the patch by Arun to pass compile against current source code.
          Arun C Murthy made changes -
          Assignee Arun C Murthy [ acmurthy ] Siddharth Seth [ sseth ]
          Hide
          Ted Yu added a comment -
          +  private void makeContainerReuseDecision() {
          +    targetMapContainers = 
          +        conf.getInt(MRJobConfig.MR_AM_CONTAINER_REUSE_MAX_CONTAINERS, 
          +            numMapTasks);
          +  }
          

          Maybe more logic is going to be added to the above method ?

          +  //        Key->Resource Capability
          +  //        Value->ResourceRequest
          +  protected final Map<Priority, Map<String, ResourceRequest>>
             remoteRequestsTable =
          -      new TreeMap<Priority, Map<String, Map<Resource, ResourceRequest>>>();
          +      new HashMap<Priority, Map<String, ResourceRequest>>();
          

          The comment above doesn't seem to match the Map structure.

          Show
          Ted Yu added a comment - + private void makeContainerReuseDecision() { + targetMapContainers = + conf.getInt(MRJobConfig.MR_AM_CONTAINER_REUSE_MAX_CONTAINERS, + numMapTasks); + } Maybe more logic is going to be added to the above method ? + // Key->Resource Capability + // Value->ResourceRequest + protected final Map<Priority, Map< String , ResourceRequest>> remoteRequestsTable = - new TreeMap<Priority, Map< String , Map<Resource, ResourceRequest>>>(); + new HashMap<Priority, Map< String , ResourceRequest>>(); The comment above doesn't seem to match the Map structure.
          Hide
          Kang Xiao added a comment -

          Container resuse will also be useful to scale RM since it reduce the scheduling load of RM.

          Show
          Kang Xiao added a comment - Container resuse will also be useful to scale RM since it reduce the scheduling load of RM.
          Arun C Murthy made changes -
          Summary MR AM should reuse containers for map tasks MR AM should reuse containers for map tasks, there-by allowing fine-grained control on num-maps for users without need for CombineFileInputFormat etc.
          Hide
          Arun C Murthy added a comment -

          Is there a cap on the amount of re-use? For example, if the container has been in use for more than 1 minute then do not re-use it.

          Not currently, but we could add something like this - except it won't make too much difference since you need to run the remaining maps in other containers anyway!

          Or to rephrase, what prevents a cluster with a few large jobs from having hogged containers?

          The central scheduler (e.g CapacityScheduler) already uses queue-capacities and user-limits, (and in future, preemption) to prevent this.

          Show
          Arun C Murthy added a comment - Is there a cap on the amount of re-use? For example, if the container has been in use for more than 1 minute then do not re-use it. Not currently, but we could add something like this - except it won't make too much difference since you need to run the remaining maps in other containers anyway! Or to rephrase, what prevents a cluster with a few large jobs from having hogged containers? The central scheduler (e.g CapacityScheduler) already uses queue-capacities and user-limits, (and in future, preemption) to prevent this.
          Hide
          Jay Finger added a comment -

          I haven't read the patch, forgive me if the answer is already there.

          Is there a cap on the amount of re-use? For example, if the container has been in use for more than 1 minute then do not re-use it.

          Or to rephrase, what prevents a cluster with a few large jobs from having hogged containers?

          Show
          Jay Finger added a comment - I haven't read the patch, forgive me if the answer is already there. Is there a cap on the amount of re-use? For example, if the container has been in use for more than 1 minute then do not re-use it. Or to rephrase, what prevents a cluster with a few large jobs from having hogged containers?
          Arun C Murthy made changes -
          Field Original Value New Value
          Attachment MAPREDUCE-3902.patch [ 12515757 ]
          Hide
          Arun C Murthy added a comment -

          Ok, I spent a long (isolated) flight on this - it clearly needs more work, but it's a start. smile

          This patch improves the classic JVM re-use on both dimensions described in the jira.

          We need to pay more attention to the user interface, some options:

          1. Allow user to specify actual number of map slots to be used (supported now, in the patch)
          2. Allow user to specify a target block-size for maps (which is greater than real HDFS block size) i.e. get around the small-files problem.

          Thoughts?

          Show
          Arun C Murthy added a comment - Ok, I spent a long (isolated) flight on this - it clearly needs more work, but it's a start. smile This patch improves the classic JVM re-use on both dimensions described in the jira. We need to pay more attention to the user interface, some options: Allow user to specify actual number of map slots to be used (supported now, in the patch) Allow user to specify a target block-size for maps (which is greater than real HDFS block size) i.e. get around the small-files problem. Thoughts?
          Arun C Murthy created issue -

            People

            • Assignee:
              Kannan Rajah
              Reporter:
              Arun C Murthy
            • Votes:
              1 Vote for this issue
              Watchers:
              36 Start watching this issue

              Dates

              • Created:
                Updated:

                Development