Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7402

Federation V2: Global Optimizations

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • federation
    • None

    Description

      YARN Federation today requires manual configuration of queues within each sub-cluster, and each RM operates "in isolation". This has few issues:

      1. Preemption is computed locally (and might far exceed the global need)
      2. Jobs within a queue are forced to consume their resources "evenly" based on queue mapping

      This umbrella JIRA tracks a new feature that leverages the FederationStateStore as a synchronization mechanism among RMs, and allows for allocation and preemption decisions to be based on a (close to up-to-date) global view of the cluster allocation and demand. The JIRA also tracks algorithms to automatically generate policies for Router and AMRMProxy to shape the traffic to each sub-cluster, and general "maintenance" of the FederationStateStore.

      Attachments

        Issue Links

          1.
          [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos Sub-task Resolved Carlo Curino  
          2.
          [GQ] Data structures for federation global queues calculations Sub-task Resolved Abhishek Modi  
          3.
          [GQ] Compute global and local "IdealAllocation" Sub-task Patch Available Carlo Curino  
          4.
          [GQ] Compute global "ideal allocation" including locality biases Sub-task Resolved Carlo Curino  
          5.
          [GQ] Rebalance queue configuration for load-balancing and locality affinities Sub-task Patch Available Carlo Curino  
          6.
          [GQ] propagate to GPG queue-level utilization/pending information Sub-task Open Abhishek Modi  
          7.
          [GQ] Bias container allocations based on global view Sub-task Open Abhishek Modi  
          8.
          [GQ] Generator for queue hierarchies over federated clusters Sub-task Open Konstantinos Karanasos  
          9.
          [GQ] Compare resource allocation achieved by rebalancing algorithms with single-cluster capacity scheduler allocation Sub-task Open Konstantinos Karanasos  
          10.
          [RESERVATION] Support ListReservation APIs in Federation Router Sub-task Resolved Shilun Fan  
          11.
          [GPG] Federation Global Policy Generator (service hook only) Sub-task Resolved Botong Huang  
          12.
          [GPG] Add SubClusterCleaner in Global Policy Generator Sub-task Resolved Botong Huang  
          13.
          [GPG] ApplicationCleaner in Global Policy Generator Sub-task Resolved Botong Huang  
          14.
          [GPG] Policy generator framework Sub-task Resolved Young Chen  
          15.
          [GPG] Load based policy generator Sub-task Resolved Young Chen  
          16.
          [PERF/TEST] Extend SLS to support simulation of a Federated Environment Sub-task Patch Available Tanuj Nayak  
          17.
          [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of queues Sub-task Open Abhishek Modi  
          18.
          [PERF/TEST] Performance testing of ReservationSystem at high job submission rates Sub-task Open Xiaohua (Victor) Liang  
          19.
          [PERF/TEST] Add JMX metrics for Plan Follower and Agent Placement and Plan Operations Sub-task Patch Available Xiaohua (Victor) Liang  
          20.
          [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router Sub-task Patch Available Minni Mittal

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          21.
          [AMRMProxy] Stateful FederationInterceptor for pending requests Sub-task Resolved Botong Huang  
          22.
          [AMRMProxy] AMRMClientRelayer for stateful FederationInterceptor Sub-task Resolved Botong Huang  
          23.
          [Router] Federation: routing getAppState REST invocations transparently to multiple RMs Sub-task Resolved Giovanni Matteo Fumarola  
          24.
          [Router] Federation: Improve Router REST API Metrics Sub-task Resolved Shilun Fan  
          25.
          [GPG] Fix potential connection leak in GPGUtils Sub-task Resolved Giovanni Matteo Fumarola  
          26.
          [Router] Implement missing FederationClientInterceptor#getApplications() Sub-task Resolved D M Murali Krishna Reddy  
          27.
          [FederationStateStore - MySql] Deadlock In addApplicationHome Sub-task Patch Available Shilun Fan  
          28.
          [Router] Add cache for fast answers to getApps Sub-task Resolved Shilun Fan  
          29.
          [GPG] Add max heap config option for Federation GPG Sub-task Resolved Botong Huang  
          30.
          [GPG] Add FederationStateStore getAppInfo API for GlobalPolicyGenerator Sub-task Resolved Botong Huang  
          31.
          [GPG] Add Yarn Registry cleanup in ApplicationCleaner Sub-task Resolved Botong Huang  
          32.
          [GPG] Add JvmMetricsInfo and pause monitor Sub-task Resolved Bilwa S T  
          33.
          [GPG] fix order of steps cleaning Registry entries in ApplicationCleaner Sub-task Resolved Botong Huang  
          34.
          [GPG] Support HTTPS in GPG Sub-task In Progress Shilun Fan  
          35.
          [GPG] support secure mode Sub-task Patch Available Shilun Fan  

          Activity

            People

              curino Carlo Curino
              curino Carlo Curino
              Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m