Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.16.0, 1.17.1
-
None
-
None
Description
TaskManager WorkingDirectory is not removed during shutdown.
Repro
1. Execute a Flink batch job within a Flink on YARN Session
flink-yarn-session -d
flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
The batch job completes successfully, but the taskmanager working directory is not being removed.
[root@ip-1-2-3-4 container_1705470896818_0017_01_000002]# ls -R -lrt /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002 /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002: total 0 drwxr-xr-x 2 yarn yarn 6 Jan 18 08:34 tmp drwxr-xr-x 4 yarn yarn 66 Jan 18 08:34 blobStorage drwxr-xr-x 2 yarn yarn 6 Jan 18 08:34 slotAllocationSnapshots drwxr-xr-x 2 yarn yarn 6 Jan 18 08:34 localState /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/tmp: total 0 /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/blobStorage: total 0 drwxr-xr-x 2 yarn yarn 94 Jan 18 08:34 job_d11f7085314ef1fb04c4e12fe292185a drwxr-xr-x 2 yarn yarn 6 Jan 18 08:34 incoming /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/blobStorage/job_d11f7085314ef1fb04c4e12fe292185a: total 12 -rw-r--r-- 1 yarn yarn 10323 Jan 18 08:34 blob_p-cdd441a64b3ea6eed0058df02c6c10fd208c94a8-86d84864273dad1e8084d8ef0f5aad52 /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/blobStorage/incoming: total 0 /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/slotAllocationSnapshots: total 0 /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/localState: total 0
Analysis
1. The TaskManagerRunner removes the working directory only when its 'close' method is called, which never happens.
public void close() throws Exception { try { closeAsync().get(); } catch (ExecutionException e) { ExceptionUtils.rethrowException(ExceptionUtils.stripExecutionException(e)); } } public CompletableFuture<Result> closeAsync() { return closeAsync(Result.SUCCESS); }