Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-34142

TaskManager WorkingDirectory is not removed during shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.16.0, 1.17.1
    • None
    • Deployment / YARN
    • None

    Description

      TaskManager WorkingDirectory is not removed during shutdown. 

      Repro

       

      1. Execute a Flink batch job within a Flink on YARN Session
      
      flink-yarn-session -d
      
      flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
      
      

      The batch job completes successfully, but the taskmanager working directory is not being removed.

      [root@ip-1-2-3-4 container_1705470896818_0017_01_000002]# ls -R -lrt /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002
      /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002:
      total 0
      drwxr-xr-x 2 yarn yarn  6 Jan 18 08:34 tmp
      drwxr-xr-x 4 yarn yarn 66 Jan 18 08:34 blobStorage
      drwxr-xr-x 2 yarn yarn  6 Jan 18 08:34 slotAllocationSnapshots
      drwxr-xr-x 2 yarn yarn  6 Jan 18 08:34 localState
      
      /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/tmp:
      total 0
      
      /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/blobStorage:
      total 0
      drwxr-xr-x 2 yarn yarn 94 Jan 18 08:34 job_d11f7085314ef1fb04c4e12fe292185a
      drwxr-xr-x 2 yarn yarn  6 Jan 18 08:34 incoming
      
      /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/blobStorage/job_d11f7085314ef1fb04c4e12fe292185a:
      total 12
      -rw-r--r-- 1 yarn yarn 10323 Jan 18 08:34 blob_p-cdd441a64b3ea6eed0058df02c6c10fd208c94a8-86d84864273dad1e8084d8ef0f5aad52
      
      /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/blobStorage/incoming:
      total 0
      
      /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/slotAllocationSnapshots:
      total 0
      
      /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/localState:
      total 0
      
      
      

      Analysis

      1. The TaskManagerRunner removes the working directory only when its 'close' method is called, which never happens.

          public void close() throws Exception {
              try {
                  closeAsync().get();
              } catch (ExecutionException e) {
                  ExceptionUtils.rethrowException(ExceptionUtils.stripExecutionException(e));
              }
          }
      
          public CompletableFuture<Result> closeAsync() {
              return closeAsync(Result.SUCCESS);
          }
      

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            prabhujoseph Prabhu Joseph
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: