Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5763

Task stuck in fetching is not cleaned up after --executor_registration_timeout.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.28.0, 1.0.0
    • 0.28.3, 1.0.0
    • containerization
    • None

    Description

      When the fetching process hangs forever due to reasons such as HDFS issues, Mesos containerizer would attempt to destroy the container and kill the executor after --executor_registration_timeout. However this reliably fails for us: the executor would be killed by the launcher destroy and the container would be destroyed but the agent would never find out that the executor is terminated thus leaving the task in the STAGING state forever.

      Attachments

        Activity

          People

            xujyan Yan Xu
            xujyan Yan Xu
            Jie Yu Jie Yu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: