Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8425

Add blacklist mechanism for task scheduling

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.0
    • Component/s: Scheduler, YARN
    • Labels:
      None

      Issue Links

        Activity

        Hide
        apachespark Apache Spark added a comment -

        User 'jerryshao' has created a pull request for this issue:
        https://github.com/apache/spark/pull/6870

        Show
        apachespark Apache Spark added a comment - User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/6870
        Show
        jerryshao Saisai Shao added a comment - Repost the design doc here: https://docs.google.com/document/d/1oibmfcyewy_kBLjGwVbGAOXZGlWyWADJu1TbBc06sdk/edit?usp=sharing
        Hide
        mwws Mao, Wei added a comment -

        Here is the new design doc based on Saisai's previous work with some enhancement.
        https://docs.google.com/document/d/1EqdocdbOH0eZ0Vp1RAHsE-8gKv9yPez1Xt8W5xgXn3I/edit?usp=sharing

        Show
        mwws Mao, Wei added a comment - Here is the new design doc based on Saisai's previous work with some enhancement. https://docs.google.com/document/d/1EqdocdbOH0eZ0Vp1RAHsE-8gKv9yPez1Xt8W5xgXn3I/edit?usp=sharing
        Hide
        irashid Imran Rashid added a comment -

        Adding the design doc from SPARK-8426 here: https://docs.google.com/document/d/1R2CVKctUZG9xwD67jkRdhBR4sCgccPR2dhTYSRXFEmg/edit?usp=sharing

        Also I want to point a change in behavior I am proposing in the design doc – I think its best if there is no timeout for the blacklist within one stage. Once a task gets blacklisted for a particular stage, it will there forever. The timeout will only be for when executors and nodes get blacklisted across all stages. This greatly simplifies the implementation, and I dont' really think there is any significant downside.

        OTOH, it is a behavior change from the old blacklisting.

        Show
        irashid Imran Rashid added a comment - Adding the design doc from SPARK-8426 here: https://docs.google.com/document/d/1R2CVKctUZG9xwD67jkRdhBR4sCgccPR2dhTYSRXFEmg/edit?usp=sharing Also I want to point a change in behavior I am proposing in the design doc – I think its best if there is no timeout for the blacklist within one stage . Once a task gets blacklisted for a particular stage, it will there forever. The timeout will only be for when executors and nodes get blacklisted across all stages. This greatly simplifies the implementation, and I dont' really think there is any significant downside. OTOH, it is a behavior change from the old blacklisting.
        Hide
        apachespark Apache Spark added a comment -

        User 'squito' has created a pull request for this issue:
        https://github.com/apache/spark/pull/13951

        Show
        apachespark Apache Spark added a comment - User 'squito' has created a pull request for this issue: https://github.com/apache/spark/pull/13951
        Hide
        tgraves Thomas Graves added a comment -

        Added some questions to the design doc

        Show
        tgraves Thomas Graves added a comment - Added some questions to the design doc
        Hide
        apachespark Apache Spark added a comment -

        User 'squito' has created a pull request for this issue:
        https://github.com/apache/spark/pull/14079

        Show
        apachespark Apache Spark added a comment - User 'squito' has created a pull request for this issue: https://github.com/apache/spark/pull/14079
        Hide
        tgraves Thomas Graves added a comment -

        Slightly different scenario since its not with the scheduling, but do we have any jira for blacklisting a node where containers/executors won't launch on it?

        Show
        tgraves Thomas Graves added a comment - Slightly different scenario since its not with the scheduling, but do we have any jira for blacklisting a node where containers/executors won't launch on it?
        Hide
        irashid Imran Rashid added a comment -

        Thomas Graves I don't know of any jira for that. Sounds like you already hit this case, so I'll let you open a jira as you might be able to provide more info, but please add me to it.

        Show
        irashid Imran Rashid added a comment - Thomas Graves I don't know of any jira for that. Sounds like you already hit this case, so I'll let you open a jira as you might be able to provide more info, but please add me to it.
        Hide
        irashid Imran Rashid added a comment - - edited

        Breaking off a smaller chunk of this that can be added independently in SPARK-17675

        Show
        irashid Imran Rashid added a comment - - edited Breaking off a smaller chunk of this that can be added independently in SPARK-17675
        Hide
        irashid Imran Rashid added a comment -

        Seems like there is agreement on the design, so I'm attaching a snapshot of the design doc. (Original google doc here: https://docs.google.com/document/d/1R2CVKctUZG9xwD67jkRdhBR4sCgccPR2dhTYSRXFEmg/edit)

        Show
        irashid Imran Rashid added a comment - Seems like there is agreement on the design, so I'm attaching a snapshot of the design doc. (Original google doc here: https://docs.google.com/document/d/1R2CVKctUZG9xwD67jkRdhBR4sCgccPR2dhTYSRXFEmg/edit )
        Hide
        irashid Imran Rashid added a comment -

        Issue resolved by pull request 14079
        https://github.com/apache/spark/pull/14079

        Show
        irashid Imran Rashid added a comment - Issue resolved by pull request 14079 https://github.com/apache/spark/pull/14079
        Hide
        apachespark Apache Spark added a comment -

        User 'squito' has created a pull request for this issue:
        https://github.com/apache/spark/pull/16298

        Show
        apachespark Apache Spark added a comment - User 'squito' has created a pull request for this issue: https://github.com/apache/spark/pull/16298

          People

          • Assignee:
            mwws Mao, Wei
            Reporter:
            jerryshao Saisai Shao
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development