Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8849

DynoYARN: A simulation and testing infrastructure for YARN clusters

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:


      Traditionally, YARN workload simulation is performed using SLS (Scheduler Load Simulator) which is packaged with YARN. It Essentially, starts a full fledged ResourceManager, but runs simulators for the NodeManager and the ApplicationMaster Containers. These simulators are lightweight and run in a threadpool. The NM simulators do not open any external ports and send (in-process) heartbeats to the ResourceManager.

      There are a couple of drawbacks with using the SLS:

      • It might be difficult to simulate really large clusters without having access to a very beefy box - since the NMs are launched as tasks in a threadpool, and each NM has to send periodic heartbeats to the RM.
      • Certain features (like YARN-1011) requires changes to the NodeManager - aspects such as queuing and selectively killing containers have to be incorporated into the existing NM Simulator which might make the simulator a bit heavy weight - there is a need for locking and synchronization.
      • Since the NM and AM are simulations, only the Scheduler is faithfully tested - it does not really perform an end-2-end test of a cluster.

      Therefore, drawing inspiration from Dynamometer, we propose a framework for YARN deployable YARN cluster - DynoYARN - for testing, with the following features:

      • The NM already has hooks to plug-in custom ContainerExecutor and NodeResourceMonitor. If we can plug-in a custom ContainersMonitorImpl's Monitoring thread (and other modules like the LocalizationService), We can probably inject an Executor that does not actually launch containers and a Node and Container resource monitor that reports synthetic pre-specified Utilization metrics back to the RM.
      • Since we are launching fake containers, we cannot run normal AM containers. We can therefore, use Unmanaged AM's to launch synthetic jobs.

      Essentially, a test workflow would look like this:

      • Launch a DynoYARN cluster.
      • Use the Unmanaged AM feature to directly negotiate with the DynaYARN Resource Manager for container tokens.
      • Use the container tokens from the RM to directly ask the DynoYARN Node Managers to start fake containers.
      • The DynoYARN NodeManagers will start the fake containers and report to the DynoYARN Resource Manager synthetically generated resource utilization for the containers (which will be injected via the ContainerLaunchContext and parsed by the plugged-in Container Executor).
      • The Scheduler will use the utilization report to schedule containers - we will be able to test allocation of Opportunistic containers based on resource utilization.
      • Since the DynoYARN Node Managers run the actual code paths, all preemption and queuing logic will be faithfully executed.


        Issue Links



            • Assignee:
              jhung Jonathan Hung
              asuresh Arun Suresh


              • Created:

                Issue deployment