Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2425

Standalone Master is too aggressive in removing Applications

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1.1, 1.2.0
    • Component/s: Spark Core
    • Labels:
      None
    • Target Version/s:

      Description

      When standalone Executors trying to run a particular Application fail a cummulative ApplicationState.MAX_NUM_RETRY times, Master will remove the Application. This will be true even if there actually are a number of Executors that are successfully running the Application. This makes long-running standalone-mode Applications in particular unnecessarily vulnerable to limited failures in the cluster – e.g., a single bad node on which Executors repeatedly fail for any reason can prevent an Application from starting or can result in a running Application being removed even though it could continue to run successfully (just not making use of all potential Workers and Executors.)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                markhamstra Mark Hamstra
                Reporter:
                markhamstra Mark Hamstra
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: