Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Invalid
-
1.3.0, 1.4.0
-
None
-
None
Description
The TaskManager uses the BlobClient to upload its stdout/log file to the BlobServer. If HA mode is enabled, then these files will also be uploaded to the BlobStore. Since the TaskManagerLogHandler only cleans up files from a TM in case it has already received another file from this TM and additionally does this in a non thread safe manner, it can easily happen that files won't get cleaned up from the BlobStore.
I think we should not upload these kind of files to the persistent/HA BlobStore. We could do this by introducing a storage mode when uploading files to the BlobServer (e.g. HA_STORAGE vs. LOCAL_STORAGE). Additionally, we should also register a timeout for only locally stored files or at least store them under its JobID such that these files are also cleaned up once the job is being cleaned up.