Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5597

YARN Federation improvements

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • federation
    • Reviewed
    • Hide
      We have enhanced the YARN Federation functionality for improved usability. The enhanced features are as follows:
      1. YARN Router now boasts a full implementation of all interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol.
      2. YARN Router support for application cleanup and automatic offline mechanisms for subCluster.
      3. Code improvements were undertaken for the Router and AMRMProxy, along with enhancements to previously pending functionalities.
      4. Audit logs and Metrics for Router received upgrades.
      5. A boost in cluster security features was achieved, with the inclusion of Kerberos support.
      6. The page function of the router has been enhanced.
      7. A set of commands has been added to the Router side for operating on SubClusters and Policies.
      Show
      We have enhanced the YARN Federation functionality for improved usability. The enhanced features are as follows: 1. YARN Router now boasts a full implementation of all interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol. 2. YARN Router support for application cleanup and automatic offline mechanisms for subCluster. 3. Code improvements were undertaken for the Router and AMRMProxy, along with enhancements to previously pending functionalities. 4. Audit logs and Metrics for Router received upgrades. 5. A boost in cluster security features was achieved, with the inclusion of Kerberos support. 6. The page function of the router has been enhanced. 7. A set of commands has been added to the Router side for operating on SubClusters and Policies.
    • Important

    Description

      This umbrella JIRA tracks set of improvements over the YARN Federation MVP (YARN-2915)

      Attachments

        Issue Links

          1.
          Federation maintenance mechanisms (simple CLI and command propagation) Sub-task Resolved Shilun Fan  
          2.
          Add versioning for FederationStateStore Sub-task Resolved Shilun Fan  
          3.
          Add support for AMRMProxy HA Sub-task Resolved Botong Huang  
          4.
          Advanced Federation UI based on YARN UI v2 Sub-task Resolved Shilun Fan  
          5.
          Refactoring SQLFederationStateStore by avoiding to recreate a connection at every call Sub-task Resolved Bilwa S T  
          6.
          Running RM tests against the Router Sub-task Resolved Shilun Fan  
          7.
          Refactoring Router services to use common util classes for pipeline creations Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 50m
          8.
          Create SecureLogin inside Router Sub-task Resolved Xie YiFan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5.5h
          9.
          Replace MockResourceManagerFacade with MockRM for AMRMProxy/Router tests Sub-task Resolved Bilwa S T  
          10.
          Handle containerId duplicate without failing the heartbeat in Federation Interceptor Sub-task Resolved Shilun Fan  
          11.
          Federation Router (hiding multiple RMs for ApplicationClientProtocol) phase 2 Sub-task Resolved Shilun Fan  
          12.
          Metrics for Federation AMRMProxy Sub-task Resolved Young Chen  
          13.
          Add support for work preserving NM restart when FederationInterceptor is enabled in AMRMProxyService Sub-task Resolved Botong Huang  
          14.
          Adding RM ClusterId in AppInfo Sub-task Resolved Tanuj Nayak

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          15.
          Adding RM Cluster Id in ApplicationReport Sub-task Resolved Bilwa S T  
          16.
          Federation Router Web Service fixes Sub-task Resolved Íñigo Goiri  
          17.
          Race condition between second app attempt and UAM timeout when first attempt node is down Sub-task Resolved Shilun Fan  
          18.
          Handle AM register requests asynchronously in FederationInterceptor Sub-task Resolved Botong Huang  
          19.
          Add config in FederationRMFailoverProxy to not bypass facade cache when failing over Sub-task Resolved Botong Huang  
          20.
          AMRMProxy recover should catch for all throwable to avoid premature exit Sub-task Resolved Botong Huang  
          21.
          Yarn RM Epoch should wrap around Sub-task Resolved Young Chen  
          22.
          [AMRMProxy] Add sub-cluster timeout in LocalityMulticastAMRMProxyPolicy Sub-task Resolved Botong Huang  
          23.
          [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor Sub-task Resolved Young Chen  
          24.
          [AMRMProxy] More robust responseId resync after an YarnRM master slave switch Sub-task Resolved Botong Huang  
          25.
          [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async Sub-task Resolved Botong Huang  
          26.
          LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource Sub-task Resolved Botong Huang  
          27.
          Add Yarnclient#yarnclusterMetrics API implementation in router Sub-task Resolved Bibin Chundatt  
          28.
          Refactor the UAM heartbeat thread in preparation for YARN-8696 Sub-task Resolved Botong Huang  
          29.
          Add clean up for FederationStore apps Sub-task Resolved Bibin Chundatt  
          30.
          [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer Sub-task Resolved Botong Huang  
          31.
          Add audit logs for router service Sub-task Resolved Minni Mittal

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 10m
          32.
          Create HomePolicyManager that sends all the requests to the home subcluster Sub-task Resolved Íñigo Goiri  
          33.
          AMRMProxyPolicies should accept heartbeat response from new/unknown subclusters Sub-task Resolved Botong Huang  
          34.
          [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client Sub-task Resolved Botong Huang  
          35.
          Fix FederationInterceptor#allocate to set application priority in allocateResponse Sub-task Resolved Shilun Fan  
          36.
          [Router] Federation: routing getContainers REST invocations transparently to multiple RMs Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 50m
          37.
          [Router] Add JvmMetricsInfo and pause monitor Sub-task Resolved Bilwa S T  
          38.
          [AMRMProxy] Fix potential empty fields in allocation response, move SubClusterTimeout to FederationInterceptor Sub-task Resolved Botong Huang  
          39.
          [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size Sub-task Resolved Shilun Fan  
          40.
          [Router] Add missing methods in RMWebProtocol Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 20m
          41.
          Mapreduce application container start fail after AM restart. Sub-task Resolved Chenyu Zheng  
          42.
          [Router] Add locality policy Sub-task Resolved Young Chen  
          43.
          Add znode hierarchy in Federation ZK State Store Sub-task Resolved Shilun Fan  
          44.
          Add application submit data to state store Sub-task Resolved Shilun Fan  
          45.
          Fix FederationStateStoreFacade#buildGetSubClustersCacheRequest Sub-task Resolved Bibin Chundatt  
          46.
          Make AMRMProxyPolicy aware of SC load Sub-task Resolved Shilun Fan  
          47.
          In Federation Secure cluster Application submission fails when authorization is enabled Sub-task Resolved Bilwa S T  
          48.
          In Federation Router Nodes/Applications/About pages throws 500 exception when https is enabled Sub-task Resolved Bilwa S T  
          49.
          [Router] Router Audit Log Add Client IP Address. Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 10m
          50.
          Metrics for Federation getClusterMetrics Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          51.
          Router's main() should support generic options Sub-task Resolved Aparajita Choudhary  
          52.
          [Router] UGI conf doesn't read user overridden configurations on Router startup Sub-task Resolved Shilun Fan  
          53.
          [Router] FederationStateStoreFacade is not reinitialized with Router conf Sub-task Resolved Shilun Fan  
          54.
          Make proxy server support YARN federation. Sub-task Resolved Chenyu Zheng

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 8h 40m
          55.
          Make router support proxy server. Sub-task Resolved Chenyu Zheng  
          56.
          [Federation] Client should be able to submit application to RM directly using normal client conf Sub-task Resolved Bilwa S T  
          57.
          [Federation] Add FederationInterceptor#allow-partial-result config. Sub-task Resolved Shilun Fan  
          58.
          Improve FederationInterceptorREST#createInterceptorForSubCluster Use WebAppUtils Sub-task Resolved Shilun Fan  
          59.
          [Federation] Fix some PBImpl classes to avoid NPE. Sub-task Resolved Shilun Fan  
          60.
          [Federation] Improve SubClusterState#fromString parameter and LogMessage Sub-task Resolved Shilun Fan  
          61.
          Improve equals, hashCode(), toString() methods of the Federation Base Object Sub-task Resolved Shilun Fan  
          62.
          In federation and security mode, nm recover may fail. Sub-task Resolved Chenyu Zheng

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 50m
          63.
          [Federation] Add SQLServer Script and Supported DB Version in Federation.md Sub-task Resolved Shilun Fan  
          64.
          [Federation] Improve FederationClientInterceptor#ThreadPool thread pool configuration. Sub-task Resolved Shilun Fan  
          65.
          PriorityBasedRouterPolicy throws exception if all sub-cluster weights have negative value Sub-task Resolved Bilwa S T  
          66.
          Fix token reset synchronization for UAM response token Sub-task Resolved Minni Mittal

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          67.
          Router Page display the db username and password in mask mode Sub-task Resolved Shilun Fan  
          68.
          Add enhanced headroom in AllocateResponse Sub-task Resolved Minni Mittal

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 20m
          69.
          Add YARN_ROUTER_HEAPSIZE to yarn-env for routers Sub-task Resolved Minni Mittal

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          70.
          [Federation] Fix Yarn federation.md table format Sub-task Resolved Shilun Fan  
          71.
          Fix Yarn Router Broken Link Sub-task Resolved Shilun Fan  
          72.
          [Federation] Fix Typo of NodeManager AMRMProxy. Sub-task Resolved Shilun Fan  
          73.
          Improve Yarn Router Junit Test Close MockRM Sub-task Resolved Shilun Fan  
          74.
          [Federation] Improve NM FederationInterceptor removeAppFromRegistry Sub-task Resolved Shilun Fan  
          75.
          [Federation] Improve UnmanagedAMPoolManager WithoutBlock ServiceStop Sub-task Resolved Shilun Fan  
          76.
          [Federation] Add batchFinishApplicationMaster in UAMPoolManager Sub-task Resolved Shilun Fan  
          77.
          Capture the Performance Metrics of ZookeeperFederationStateStore Sub-task Resolved Shilun Fan  
          78.
          [Federation] ConfiguredRMFailoverProxyProvider Supports Randomly Select an Router. Sub-task Resolved Shilun Fan  
          79.
          Improve FederationInterceptorREST AuditLog Sub-task Resolved Shilun Fan  
          80.
          Improve FederationInterceptorREST Method Result Sub-task Resolved Shilun Fan  
          81.
          [Federation] Add WeightedHomePolicyManager Sub-task Resolved Shilun Fan  
          82.
          Make YARN Router throw Exception to client clearly Sub-task Resolved Shilun Fan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          83.
          [Federation] Improve Yarn Federation documentation Sub-task Resolved Shilun Fan  
          84.
          Fix uncleaned threads in YARN Federation interceptor threadpool Sub-task Resolved Jeffrey Chang  
          85.
          Fix typos in hadoop-yarn-server-common#federation Sub-task Resolved Shilun Fan  
          86.
          Fix 'Physical Mem Used' and 'Physical VCores Used' are not displaying data Sub-task Resolved Shilun Fan  
          87.
          YARN Router Web supports displaying information for Non-Federation. Sub-task Resolved Shilun Fan  
          88.
          [Minor] Improve UnmanagedAMPoolManager/UnmanagedApplicationManager Code Sub-task Resolved Shilun Fan  
          89.
          [Federation] Code cleanup for NodeManager#amrmproxy Sub-task Resolved Shilun Fan  
          90.
          [Federation] Improve DefaultRequestInterceptor#init Code Sub-task Resolved Shilun Fan  
          91.
          The FederationInterceptor#launchUAM Added retry logic. Sub-task Resolved Shilun Fan  
          92.
          Improve the Policy Description in Federation.md Sub-task Resolved Shilun Fan  
          93.
          [Federation] Add RouterAuditLog to log4j.properties Sub-task Resolved Shilun Fan  
          94.
          [Federation] Fix NodeManager#TestFederationInterceptor Flaky Unit Test Sub-task Resolved Shilun Fan  
          95.
          Improve existsApplicationHomeSubCluster/existsReservationHomeSubCluster Log Level Sub-task Resolved Shilun Fan  
          96.
          [Federation] Add Steps To Set up a Test Cluster. Sub-task Resolved Shilun Fan  
          97.
          Refactor AMRMProxy#FederationInterceptor#registerApplicationMaster Sub-task Resolved Shilun Fan  
          98.
          Improve createJerseyClient#setConnectTimeout Code Sub-task Resolved Shilun Fan  
          99.
          [Federation] SQLFederationStateStore Support Store ApplicationSubmitData Sub-task Resolved Shilun Fan  
          100.
          [Federation] Improve FederationClientInterceptor To Return Partial Results of subClusters. Sub-task Resolved Shilun Fan  
          101.
          Let Federation.md more standardized Sub-task Resolved WangYuanben  
          102.
          Change the time unit of scCleanerIntervalMs in Router Sub-task Resolved WangYuanben  
          103.
          Improve the time unit for FederationRMAdminInterceptor#heartbeatExpirationMillis Sub-task Resolved WangYuanben  

          Activity

            People

              subru Subramaniam Krishnan
              subru Subramaniam Krishnan
              Votes:
              0 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 43h 20m
                  43h 20m