Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9954

Flapping tasks with large sandboxes can fill agent disk

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • 5

    Description

      If a task on an agent is repeatedly re-launched after failing and pulls a large artifact into its sandbox, it can quickly fill the agent disk. This may happen on a time scale shorter than the disk watch interval, leading to the agent disk filling up.

      We should evaluate solutions to this issue. A couple options:

      • Perhaps an aggressive (short) disk watch interval is sufficient? We should investigate the performance impact of this approach.
      • If the former doesn't work, then maybe polling free disk space whenever a task is launched makes sense? (Rate-limiting this might be necessary)
      • Perhaps we can come up with some fundamentally different approach for detecting free disk space which would solve this issue?

      Attachments

        Activity

          People

            Unassigned Unassigned
            greggomann Greg Mann
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: