[MESOS-9954] Flapping tasks with large sandboxes can fill agent disk - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- foundations
- mesosphere

Story Points:
5

Description

If a task on an agent is repeatedly re-launched after failing and pulls a large artifact into its sandbox, it can quickly fill the agent disk. This may happen on a time scale shorter than the disk watch interval, leading to the agent disk filling up.

We should evaluate solutions to this issue. A couple options:

Perhaps an aggressive (short) disk watch interval is sufficient? We should investigate the performance impact of this approach.
If the former doesn't work, then maybe polling free disk space whenever a task is launched makes sense? (Rate-limiting this might be necessary)
Perhaps we can come up with some fundamentally different approach for detecting free disk space which would solve this issue?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Greg Mann

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 26/Aug/19 22:17

Updated:: 09/Sep/19 21:29