Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8489

Need to support "dominant" component concept inside YARN service

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: yarn-native-services
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      - Improved YARN service status report based on dominant component status.

      Description

      Existing YARN service support termination policy for different restart policies. For example ALWAYS means service will not be terminated. And NEVER means if all component terminated, service will be terminated.

      The name "dominant" might not be most appropriate , we can figure out better names. But in simple, it means, a dominant component which final state will determine job's final state regardless of other components.

      Use cases: 

      1) Tensorflow job has master/worker/services/tensorboard. Once master goes to final state, no matter if it is succeeded or failed, we should terminate ps/tensorboard/workers. And the mark the job to succeeded/failed. 
      2) Not sure if it is a real-world use case: A service which has multiple component, some component is not restartable. For such services, if a component is failed, we should mark the whole service to failed. 

        Attachments

        1. YARN-8489.001.patch
          16 kB
          Zac Zhou
        2. YARN-8489.002.patch
          17 kB
          Zac Zhou
        3. YARN-8489.003.patch
          20 kB
          Zac Zhou
        4. YARN-8489.004.patch
          19 kB
          Zac Zhou
        5. YARN-8489.005.patch
          20 kB
          Zac Zhou

          Activity

            People

            • Assignee:
              yuan_zac Zac Zhou
              Reporter:
              leftnoteasy Wangda Tan
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: