Uploaded image for project: 'Apache Apex Core'
  1. Apache Apex Core
  2. APEXCORE-426

Support work preserving AM recovery

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.6.0
    • None

    Description

      On app master failure, the streaming containers should continue running.

      As of 2.2, YARN will automatically terminate all containers and the replacement app master will relaunch them. Once we move to a newer minimum Hadoop version, we should leverage work preserving restart.

      The mechanism in Apex containers to locate the new master process are already in place.

      Test Cases:
      1. Kill the app-master - only app-master container id should change, all the other containers id should remain same.
      2. Kill the app-master and few other containers, make sure that killed containers are recovered.

      Attachments

        Issue Links

          Activity

            People

              sandesh Sandesh Hegde
              thw Thomas Weise
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: