Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: federation
    • Labels:
      None

      Description

      YARN Federation today requires manual configuration of queues within each sub-cluster, and each RM operates "in isolation". This has few issues:

      1. Preemption is computed locally (and might far exceed the global need)
      2. Jobs within a queue are forced to consume their resources "evenly" based on queue mapping

      This umbrella JIRA tracks a new feature that leverages the FederationStateStore as a synchronization mechanism among RMs, and allows for allocation and preemption decisions to be based on a (close to up-to-date) global view of the cluster allocation and demand. The JIRA also tracks algorithms to automatically generate policies for Router and AMRMProxy to shape the traffic to each sub-cluster, and general "maintenance" of the FederationStateStore.

        Attachments

          Issue Links

          1.
          [GQ] Compute global and local "IdealAllocation" Sub-task Patch Available Carlo Curino
          2.
          [GQ] Rebalance queue configuration for load-balancing and locality affinities Sub-task Patch Available Carlo Curino
          3.
          [GQ] propagate to GPG queue-level utilization/pending information Sub-task Open Abhishek Modi
          4.
          [GQ] Bias container allocations based on global view Sub-task Open Abhishek Modi
          5.
          [GQ] Generator for queue hierarchies over federated clusters Sub-task Open Konstantinos Karanasos
          6.
          [GQ] Compare resource allocation achieved by rebalancing algorithms with single-cluster capacity scheduler allocation Sub-task Open Konstantinos Karanasos
          7.
          [RESERVATION] Support Reservation APIs in Federation Router Sub-task Open Unassigned
          8.
          [RESERVATION] Federation StateStore: support storage/retrieval of reservations Sub-task Open Unassigned
          9.
          [RESERVATION] Add support for reservation-based routing. Sub-task Patch Available Carlo Curino
          10.
          [PERF/TEST] Extend SLS to support simulation of a Federated Environment Sub-task Patch Available Tanuj Nayak
          11.
          [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of queues Sub-task Open Abhishek Modi
          12.
          [PERF/TEST] Performance testing of ReservationSystem at high job submission rates Sub-task Open Xiaohua (Victor) Liang
          13.
          [PERF/TEST] Add JMX metrics for Plan Follower and Agent Placement and Plan Operations Sub-task Patch Available Xiaohua (Victor) Liang
          14.
          [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router Sub-task Patch Available Giovanni Matteo Fumarola
          15.
          [Router] Federation: routing some missing REST invocations transparently to multiple RMs Sub-task Patch Available Yiran Wu
          16.
          [Router] Implement missing FederationClientInterceptor#getApplications() Sub-task Patch Available Yiran Wu
          17.
          [FederationStateStore - MySql] Deadlock In addApplicationHome Sub-task Patch Available Unassigned
          18.
          [Router] Add cache service for fast answers to getApps Sub-task Open Giovanni Matteo Fumarola

            Activity

              People

              • Assignee:
                curino Carlo Curino
                Reporter:
                curino Carlo Curino
              • Votes:
                0 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated: