Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
Description
We have seen a few customers run into a hard-to-track-down bug where the staging bucket has a TTL, but files get TTL-deleted when they are still needed.
This might be because of:
1. Long lived batch jobs / streaming jobs can reference staged files arbitrarily later and will fail in bad ways if they have been deleted.
2. Some customers even hit issues where the "check file already exists" succeeds when starting a job, but then the file is TTL-deleted before the job actually starts. (This sounds crazy, but may happen if TTL is 7 days and jobs run every 7 days, for example. Race condition.)
I'm hoping it's not hard to check that files would have TTLs and warn if so.