Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8283

[Umbrella] MaWo - A Master Worker framework on top of YARN Services



    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None


      There is a need for an application / framework to handle Master-Worker scenarios. There are existing frameworks on YARN which can be used to run a job in distributed manner such as Mapreduce, Tez, Spark etc. But master-worker use-cases usually are force-fed into one of these existing frameworks which have been designed primarily around data-parallelism instead of generic Master Worker type of computations.

      In this JIRA, we’d like to contribute MaWo - a YARN Service based framework that achieves this goal. The overall goal is to create an app that can take an input job specification with tasks, their durations and have a Master dish the tasks off to a predetermined set of workers. The components will be responsible for making sure that the tasks and the overall job finish in specific time durations.

      We have been using a version of the MaWo framework for running unit tests of Hadoop in a parallel manner on an existing Hadoop YARN cluster. What typically takes 10 hours to run all of Hadoop project’s unit-tests can finish under 20 minutes on a MaWo app of about 50 containers!

      YARN-3307 was an original attempt at this but through a first-class YARN app. In this JIRA, we instead use YARN Service for orchestration so that our code can focus on the core Master Worker paradigm.


        Issue Links



              yeshavora Yesha Vora
              yeshavora Yesha Vora
              0 Vote for this issue
              24 Start watching this issue