Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9786

Race between two REMOVE_QUOTA calls crashes the master.

    XMLWordPrintableJSON

Details

    • Resource Mgmt: RI14 Sp 46
    • 2

    Description

      The existence of the quota in the master is validated here:
      https://github.com/apache/mesos/blob/a9a2acabd03181865055b77cf81e7bb310b236d6/src/master/quota_handler.cpp#L700

      Then the quota is removed from master in a deferred method call:
      https://github.com/apache/mesos/blob/a9a2acabd03181865055b77cf81e7bb310b236d6/src/master/quota_handler.cpp#L744

      And then removed from allocator in another deferred call:
      https://github.com/apache/mesos/blob/a9a2acabd03181865055b77cf81e7bb310b236d6/src/master/quota_handler.cpp#L753

      So, there is a race between two simultaneous REMOVE_QUOTA calls.

      We observe this race on a heavily loaded cluster. Currently we suspect that the client retries the call (due to the call being not processed for a long time),  and this triggers the race.

      Attachments

        Activity

          People

            mzhu Meng Zhu
            asekretenko Andrei Sekretenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: