Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5763

Task stuck in fetching is not cleaned up after --executor_registration_timeout.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.28.0, 1.0.0
    • Fix Version/s: 0.28.3, 1.0.0
    • Component/s: containerization
    • Labels:
      None

      Description

      When the fetching process hangs forever due to reasons such as HDFS issues, Mesos containerizer would attempt to destroy the container and kill the executor after --executor_registration_timeout. However this reliably fails for us: the executor would be killed by the launcher destroy and the container would be destroyed but the agent would never find out that the executor is terminated thus leaving the task in the STAGING state forever.

        Attachments

          Activity

            People

            • Assignee:
              xujyan Yan Xu
              Reporter:
              xujyan Yan Xu
              Shepherd:
              Jie Yu
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: