Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7566

Master crash due to failed check in DRFSorter::remove

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Accepted
    • Critical
    • Resolution: Unresolved
    • 1.1.1, 1.1.2
    • None
    • allocation
    • None

    Description

      A check in sorter.cpp#L355 in 1.1.2 is triggered occasionally in our cluster and crashes the master leader.

      I manually modified that check to print out the related variables, and the following is a master log.

      https://gist.github.com/zhitaoli/0662d9fe1f6d57de344951c05b536bad#file-gistfile1-txt

      From the log, it seems like the check was using an stale value revocable CPU 26 while the new value was updated to 25, thus the check crashed.

      So far two verified occurrence of this bug are both observed near an UNRESERVE operation (see lines above in the log).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zhitao Zhitao Li
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: