Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2425

Standalone Master is too aggressive in removing Applications

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.0.0
    • 1.1.1, 1.2.0
    • Spark Core
    • None

    Description

      When standalone Executors trying to run a particular Application fail a cummulative ApplicationState.MAX_NUM_RETRY times, Master will remove the Application. This will be true even if there actually are a number of Executors that are successfully running the Application. This makes long-running standalone-mode Applications in particular unnecessarily vulnerable to limited failures in the cluster – e.g., a single bad node on which Executors repeatedly fail for any reason can prevent an Application from starting or can result in a running Application being removed even though it could continue to run successfully (just not making use of all potential Workers and Executors.)

      Attachments

        Issue Links

          Activity

            People

              markhamstra Mark Hamstra
              markhamstra Mark Hamstra
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: