Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5597

YARN Federation improvements

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      This umbrella JIRA tracks set of improvements over the YARN Federation MVP (YARN-2915)

      Attachments

        Issue Links

          1.
          Federation maintenance mechanisms (simple CLI and command propagation) Sub-task Resolved Shilun Fan  
          2.
          Federation "Capacity Allocation" across sub-cluster Sub-task Open Carlo Curino  
          3.
          Add versioning for FederationStateStore Sub-task Resolved Shilun Fan  
          4.
          New ResourceCalculator implementation that operates on vector of resources, but respect sum of ratios Sub-task In Progress Shilun Fan  
          5.
          Add support for AMRMProxy HA Sub-task Resolved Botong Huang  
          6.
          Advanced Federation UI based on YARN UI v2 Sub-task Patch Available Tanuj Nayak  
          7.
          Refactoring SQLFederationStateStore by avoiding to recreate a connection at every call Sub-task Resolved Bilwa S T  
          8.
          Consider running RM tests against the Router Sub-task In Progress Shilun Fan  
          9.
          Refactoring Router services to use common util classes for pipeline creations Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 50m
          10.
          Create SecureLogin inside Router Sub-task Resolved Xie YiFan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5.5h
          11.
          Replace MockResourceManagerFacade with MockRM for AMRMProxy/Router tests Sub-task Resolved Bilwa S T  
          12.
          Handle containerId duplicate without failing the heartbeat in Federation Interceptor Sub-task Resolved Shilun Fan  
          13.
          Federation Router (hiding multiple RMs for ApplicationClientProtocol) phase 2 Sub-task In Progress Shilun Fan  
          14.
          Metrics for Federation AMRMProxy Sub-task Resolved Young Chen  
          15.
          Add support for work preserving NM restart when FederationInterceptor is enabled in AMRMProxyService Sub-task Resolved Botong Huang  
          16.
          Adding RM ClusterId in AppInfo Sub-task Resolved Tanuj Nayak

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          17.
          Adding RM Cluster Id in ApplicationReport Sub-task Resolved Bilwa S T  
          18.
          Federation Router Web Service fixes Sub-task Resolved Íñigo Goiri  
          19.
          Race condition between second app attempt and UAM timeout when first attempt node is down Sub-task Patch Available Shilun Fan  
          20.
          Handle AM register requests asynchronously in FederationInterceptor Sub-task Resolved Botong Huang  
          21.
          Add config in FederationRMFailoverProxy to not bypass facade cache when failing over Sub-task Resolved Botong Huang  
          22.
          AMRMProxy recover should catch for all throwable to avoid premature exit Sub-task Resolved Botong Huang  
          23.
          Yarn RM Epoch should wrap around Sub-task Resolved Young Chen  
          24.
          [AMRMProxy] Add sub-cluster timeout in LocalityMulticastAMRMProxyPolicy Sub-task Resolved Botong Huang  
          25.
          [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor Sub-task Resolved Young Chen  
          26.
          [AMRMProxy] More robust responseId resync after an YarnRM master slave switch Sub-task Resolved Botong Huang  
          27.
          [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async Sub-task Resolved Botong Huang  
          28.
          LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource Sub-task Resolved Botong Huang  
          29.
          Add Yarnclient#yarnclusterMetrics API implementation in router Sub-task Resolved Bibin Chundatt  
          30.
          Refactor the UAM heartbeat thread in preparation for YARN-8696 Sub-task Resolved Botong Huang  
          31.
          Add clean up for FederationStore apps Sub-task Resolved Unassigned  
          32.
          [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer Sub-task Resolved Botong Huang  
          33.
          Add audit logs for router service Sub-task Resolved Minni Mittal

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 10m
          34.
          Create HomePolicyManager that sends all the requests to the home subcluster Sub-task Resolved Íñigo Goiri  
          35.
          AMRMProxyPolicies should accept heartbeat response from new/unknown subclusters Sub-task Resolved Botong Huang  
          36.
          [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client Sub-task Resolved Botong Huang  
          37.
          Fix FederationInterceptor#allocate to set application priority in allocateResponse Sub-task In Progress Shilun Fan  
          38.
          [Router] Federation: routing getContainers REST invocations transparently to multiple RMs Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 50m
          39.
          [Router] Add JvmMetricsInfo and pause monitor Sub-task Resolved Bilwa S T  
          40.
          [AMRMProxy] Fix potential empty fields in allocation response, move SubClusterTimeout to FederationInterceptor Sub-task Resolved Botong Huang  
          41.
          [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size Sub-task Resolved Shilun Fan  
          42.
          [Router] Add missing methods in RMWebProtocol Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 20m
          43.
          Mapreduce application container start fail after AM restart. Sub-task In Progress Shilun Fan  
          44.
          [Router] Add locality policy Sub-task Resolved Young Chen  
          45.
          Add znode hierarchy in Federation ZK State Store Sub-task In Progress Shilun Fan  
          46.
          Add application submit data to state store Sub-task In Progress Shilun Fan  
          47.
          Fix FederationStateStoreFacade#buildGetSubClustersCacheRequest Sub-task Resolved Bibin Chundatt  
          48.
          Add colored policies to enable manual load balancing across sub clusters Sub-task Open Minni Mittal  
          49.
          Make AMRMProxyPolicy aware of SC load Sub-task Patch Available Shilun Fan  
          50.
          In Federation Secure cluster Application submission fails when authorization is enabled Sub-task Resolved Bilwa S T  
          51.
          In Federation Router Nodes/Applications/About pages throws 500 exception when https is enabled Sub-task Resolved Bilwa S T  
          52.
          [Router] Router Audit Log Add Client IP Address. Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 10m
          53.
          Metrics for Federation getClusterMetrics Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          54.
          Router's main() should support generic options Sub-task Open Aparajita Choudhary  
          55.
          [Router] UGI conf doesn't read user overridden configurations on Router startup Sub-task Open Aparajita Choudhary  
          56.
          [Router] FederationStateStoreFacade is not reinitialized with Router conf Sub-task Open Aparajita Choudhary  

          Activity

            People

              subru Subramaniam Krishnan
              subru Subramaniam Krishnan
              Votes:
              0 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 27h 10m
                  27h 10m