Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7402

Federation V2: Global Optimizations

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • federation
    • None

    Description

      YARN Federation today requires manual configuration of queues within each sub-cluster, and each RM operates "in isolation". This has few issues:

      1. Preemption is computed locally (and might far exceed the global need)
      2. Jobs within a queue are forced to consume their resources "evenly" based on queue mapping

      This umbrella JIRA tracks a new feature that leverages the FederationStateStore as a synchronization mechanism among RMs, and allows for allocation and preemption decisions to be based on a (close to up-to-date) global view of the cluster allocation and demand. The JIRA also tracks algorithms to automatically generate policies for Router and AMRMProxy to shape the traffic to each sub-cluster, and general "maintenance" of the FederationStateStore.

      Attachments

        Issue Links

        1.
        [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos Sub-task Resolved Carlo Curino   Actions
        2.
        [GQ] Data structures for federation global queues calculations Sub-task Resolved Abhishek Modi   Actions
        3.
        [GQ] Compute global and local "IdealAllocation" Sub-task Patch Available Carlo Curino   Actions
        4.
        [GQ] Compute global "ideal allocation" including locality biases Sub-task Resolved Carlo Curino   Actions
        5.
        [GQ] Rebalance queue configuration for load-balancing and locality affinities Sub-task Patch Available Carlo Curino   Actions
        6.
        [GQ] propagate to GPG queue-level utilization/pending information Sub-task Open Abhishek Modi   Actions
        7.
        [GQ] Bias container allocations based on global view Sub-task Open Abhishek Modi   Actions
        8.
        [GQ] Generator for queue hierarchies over federated clusters Sub-task Open Konstantinos Karanasos   Actions
        9.
        [GQ] Compare resource allocation achieved by rebalancing algorithms with single-cluster capacity scheduler allocation Sub-task Open Konstantinos Karanasos   Actions
        10.
        [RESERVATION] Support Reservation APIs in Federation Router Sub-task Open Unassigned   Actions
        11.
        [RESERVATION] Federation StateStore: support storage/retrieval of reservations Sub-task Open Unassigned   Actions
        12.
        [RESERVATION] Add support for reservation-based routing. Sub-task Patch Available Carlo Curino   Actions
        13.
        [GPG] Federation Global Policy Generator (service hook only) Sub-task Resolved Botong Huang   Actions
        14.
        [GPG] Add SubClusterCleaner in Global Policy Generator Sub-task Resolved Botong Huang   Actions
        15.
        [GPG] ApplicationCleaner in Global Policy Generator Sub-task Resolved Botong Huang   Actions
        16.
        [GPG] Policy generator framework Sub-task Resolved Young Chen   Actions
        17.
        [GPG] Load based policy generator Sub-task Resolved Young Chen   Actions
        18.
        [PERF/TEST] Extend SLS to support simulation of a Federated Environment Sub-task Patch Available Tanuj Nayak   Actions
        19.
        [PERF/TEST] Performance testing of CapacityScheudler at many-thousands of queues Sub-task Open Abhishek Modi   Actions
        20.
        [PERF/TEST] Performance testing of ReservationSystem at high job submission rates Sub-task Open Xiaohua (Victor) Liang   Actions
        21.
        [PERF/TEST] Add JMX metrics for Plan Follower and Agent Placement and Plan Operations Sub-task Patch Available Xiaohua (Victor) Liang   Actions
        22.
        [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router Sub-task Patch Available Minni Mittal

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        23.
        [AMRMProxy] Stateful FederationInterceptor for pending requests Sub-task Resolved Botong Huang   Actions
        24.
        [AMRMProxy] AMRMClientRelayer for stateful FederationInterceptor Sub-task Resolved Botong Huang   Actions
        25.
        [Router] Federation: routing getAppState REST invocations transparently to multiple RMs Sub-task Resolved Giovanni Matteo Fumarola   Actions
        26.
        [Router] Federation: routing some missing REST invocations transparently to multiple RMs Sub-task Patch Available Yiran Wu   Actions
        27.
        [GPG] Fix potential connection leak in GPGUtils Sub-task Resolved Giovanni Matteo Fumarola   Actions
        28.
        [Router] Implement missing FederationClientInterceptor#getApplications() Sub-task Patch Available D M Murali Krishna Reddy   Actions
        29.
        [FederationStateStore - MySql] Deadlock In addApplicationHome Sub-task Patch Available Unassigned   Actions
        30.
        [Router] Add cache service for fast answers to getApps Sub-task Open Young Chen   Actions
        31.
        [GPG] Add max heap config option for Federation GPG Sub-task Resolved Botong Huang   Actions
        32.
        [GPG] Add FederationStateStore getAppInfo API for GlobalPolicyGenerator Sub-task Resolved Botong Huang   Actions
        33.
        [GPG] Add Yarn Registry cleanup in ApplicationCleaner Sub-task Resolved Botong Huang   Actions
        34.
        [GPG] Add JvmMetricsInfo and pause monitor Sub-task Resolved Bilwa S T   Actions
        35.
        [GPG] fix order of steps cleaning Registry entries in ApplicationCleaner Sub-task Resolved Botong Huang   Actions
        36.
        [GPG] Support HTTPS in GPG Sub-task Open Bilwa S T   Actions
        37.
        [GPG] support secure mode Sub-task Patch Available Unassigned   Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            curino Carlo Curino Assign to me
            curino Carlo Curino

            Dates

              Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 20m
              20m

              Slack

                Issue deployment