Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7566

Master crash due to failed check in DRFSorter::remove

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Accepted
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.1.1, 1.1.2
    • Fix Version/s: None
    • Component/s: allocation
    • Labels:
      None

      Description

      A check in sorter.cpp#L355 in 1.1.2 is triggered occasionally in our cluster and crashes the master leader.

      I manually modified that check to print out the related variables, and the following is a master log.

      https://gist.github.com/zhitaoli/0662d9fe1f6d57de344951c05b536bad#file-gistfile1-txt

      From the log, it seems like the check was using an stale value revocable CPU 26 while the new value was updated to 25, thus the check crashed.

      So far two verified occurrence of this bug are both observed near an UNRESERVE operation (see lines above in the log).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                zhitao Zhitao Li
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated: