Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-644

Slave doesn't correctly handle checkpointed terminal update whose ack doesn't reach the executor

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • None
    • None

    Description

      This is the scenario.

      Slave dies after checkpointing a terminal update but before the ACK reached the executor.

      Recovered slave/status update manager retries the update and cleans it up after it gets an ACK from the scheduler.

      When the executor re-registers after this point, it still has a pending update but the slave cannot find the executor for this update because the task is completed! Currently the slave forwards this update to the SUM anyway but never acks the executor.

      Attachments

        Activity

          People

            vinodkone Vinod Kone
            vinodkone Vinod Kone
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: