Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
5
Description
If a task on an agent is repeatedly re-launched after failing and pulls a large artifact into its sandbox, it can quickly fill the agent disk. This may happen on a time scale shorter than the disk watch interval, leading to the agent disk filling up.
We should evaluate solutions to this issue. A couple options:
- Perhaps an aggressive (short) disk watch interval is sufficient? We should investigate the performance impact of this approach.
- If the former doesn't work, then maybe polling free disk space whenever a task is launched makes sense? (Rate-limiting this might be necessary)
- Perhaps we can come up with some fundamentally different approach for detecting free disk space which would solve this issue?