Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34245

Master may not remove the finished executor when Worker fails to send ExecutorStateChanged

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.7, 3.0.1, 3.1.1, 3.2.0
    • Fix Version/s: 3.2.0
    • Component/s: Deploy, Spark Core
    • Labels:
      None

      Description

      If the Worker fails to send ExecutorStateChanged to the Master due to some errors, e.g., temporary network error, then the Master can't remove the finished executor normally and think the executor is still alive. In the worst case, if the executor is the only one executor for the application, the application can get hang.

       

        Attachments

          Activity

            People

            • Assignee:
              Ngone51 wuyi
              Reporter:
              Ngone51 wuyi

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment