Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7402

Federation V2: Global Optimizations

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • federation

    Description

      YARN Federation today requires manual configuration of queues within each sub-cluster, and each RM operates "in isolation". This has few issues:

      1. Preemption is computed locally (and might far exceed the global need)
      2. Jobs within a queue are forced to consume their resources "evenly" based on queue mapping

      This umbrella JIRA tracks a new feature that leverages the FederationStateStore as a synchronization mechanism among RMs, and allows for allocation and preemption decisions to be based on a (close to up-to-date) global view of the cluster allocation and demand. The JIRA also tracks algorithms to automatically generate policies for Router and AMRMProxy to shape the traffic to each sub-cluster, and general "maintenance" of the FederationStateStore.

      Attachments

        Issue Links

        1.
        [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos Sub-task Resolved Carlo Curino   Actions
        2.
        [GQ] Data structures for federation global queues calculations Sub-task Resolved Abhishek Modi   Actions
        3.
        [GQ] Compute global and local "IdealAllocation" Sub-task Patch Available Carlo Curino   Actions
        4.
        [GQ] Compute global "ideal allocation" including locality biases Sub-task Resolved Carlo Curino   Actions
        5.
        [GQ] Rebalance queue configuration for load-balancing and locality affinities Sub-task Patch Available Carlo Curino   Actions
        6.
        [GQ] propagate to GPG queue-level utilization/pending information Sub-task Open Abhishek Modi   Actions
        7.
        [GQ] Bias container allocations based on global view Sub-task Open Abhishek Modi   Actions
        8.
        [GQ] Generator for queue hierarchies over federated clusters Sub-task Open Konstantinos Karanasos   Actions
        9.
        [GQ] Compare resource allocation achieved by rebalancing algorithms with single-cluster capacity scheduler allocation Sub-task Open Konstantinos Karanasos   Actions
        10.
        [RESERVATION] Support ListReservation APIs in Federation Router Sub-task Resolved Shilun Fan   Actions
        11.
        [GPG] Federation Global Policy Generator (service hook only) Sub-task Resolved Botong Huang   Actions
        12.
        [GPG] Add SubClusterCleaner in Global Policy Generator Sub-task Resolved Botong Huang   Actions
        13.
        [GPG] ApplicationCleaner in Global Policy Generator Sub-task Resolved Botong Huang   Actions
        14.
        [GPG] Policy generator framework Sub-task Resolved Young Chen   Actions
        15.
        [GPG] Load based policy generator Sub-task Resolved Young Chen   Actions
        16.
        [PERF/TEST] Extend SLS to support simulation of a Federated Environment Sub-task Patch Available Shilun Fan   Actions
        17.
        [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of queues Sub-task Open Abhishek Modi   Actions
        18.
        [PERF/TEST] Performance testing of ReservationSystem at high job submission rates Sub-task Open Xiaohua (Victor) Liang   Actions
        19.
        [PERF/TEST] Add JMX metrics for Plan Follower and Agent Placement and Plan Operations Sub-task Patch Available Xiaohua (Victor) Liang   Actions
        20.
        [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router Sub-task Patch Available Shilun Fan

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        21.
        [AMRMProxy] Stateful FederationInterceptor for pending requests Sub-task Resolved Botong Huang   Actions
        22.
        [AMRMProxy] AMRMClientRelayer for stateful FederationInterceptor Sub-task Resolved Botong Huang   Actions
        23.
        [Router] Federation: routing getAppState REST invocations transparently to multiple RMs Sub-task Resolved Giovanni Matteo Fumarola   Actions
        24.
        [Router] Federation: Improve Router REST API Metrics Sub-task Resolved Shilun Fan   Actions
        25.
        [GPG] Fix potential connection leak in GPGUtils Sub-task Resolved Giovanni Matteo Fumarola   Actions
        26.
        [Router] Implement missing FederationClientInterceptor#getApplications() Sub-task Resolved D M Murali Krishna Reddy   Actions
        27.
        [FederationStateStore - MySql] Improve ApplicationHome Mysql Script. Sub-task Patch Available Shilun Fan   Actions
        28.
        [Router] Add cache for fast answers to getApps Sub-task Resolved Shilun Fan   Actions
        29.
        [GPG] Add max heap config option for Federation GPG Sub-task Resolved Botong Huang   Actions
        30.
        [GPG] Add FederationStateStore getAppInfo API for GlobalPolicyGenerator Sub-task Resolved Botong Huang   Actions
        31.
        [GPG] Add Yarn Registry cleanup in ApplicationCleaner Sub-task Resolved Botong Huang   Actions
        32.
        [GPG] Add JvmMetricsInfo and pause monitor Sub-task Resolved Bilwa S T   Actions
        33.
        [GPG] fix order of steps cleaning Registry entries in ApplicationCleaner Sub-task Resolved Botong Huang   Actions
        34.
        [GPG] Support HTTPS in GPG Sub-task Resolved Shilun Fan   Actions
        35.
        [GPG] Support Secure Mode Sub-task Resolved Shilun Fan   Actions
        36.
        [GPG] YARN GPG mistakenly deleted applicationid Sub-task In Progress Shilun Fan

        0%

        Original Estimate - 168h
        Remaining Estimate - 168h
        Actions
        37.
        Federation "Capacity Allocation" across sub-cluster Sub-task In Progress Jiandan Yang   Actions
        38.
        New ResourceCalculator implementation that operates on vector of resources, but respect sum of ratios Sub-task In Progress Jiandan Yang   Actions
        39.
        [Federation] GPG Support Query Policies In Web. Sub-task Resolved Shilun Fan   Actions
        40.
        Add YARN_GLOBALPOLICYGENERATOR_HEAPSIZE to yarn-env for GPG Sub-task Resolved Shilun Fan   Actions
        41.
        [GPG] Add Information About YARN GPG in Federation.md Sub-task Resolved Shilun Fan   Actions
        42.
        Add colored policies to enable manual load balancing across sub clusters Sub-task Open Chenyu Zheng   Actions
        43.
        [GPG] Improve GPGPolicyFacade#getPolicyManager Sub-task Resolved Shilun Fan   Actions
        44.
        [GPG] Improve GPGOverviewBlock Infomation Sub-task Resolved Shilun Fan   Actions
        45.
        [GPG] Add GPGWebServices Sub-task Resolved Shilun Fan   Actions
        46.
        [Doc] Add allow-partial-result description to Yarn Federation documentation Sub-task Resolved Shilun Fan   Actions
        47.
        In Federation, kill application from client does not kill Unmanaged AM's and containers launched by Unmanaged AM Sub-task Resolved Shilun Fan   Actions
        48.
        [GPG] GPG Support CLI. Sub-task Resolved Shilun Fan   Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            curino Carlo Curino
            curino Carlo Curino

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 168h Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Remaining Estimate - 168h
                20m

                Slack

                  Issue deployment