Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3611 Support Docker Containers In LinuxContainerExecutor
  3. YARN-8335

Privileged docker containers' jobSubmitDir does not get successfully cleaned up

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None

    Description

      The jobSubmitDir directory is owned by root and is being cleaned up as the submitting user, which appears to be why it is failing to clean up.

      2018-05-21 19:46:15,124 WARN  [DeletionService #0] privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell execution returned exit code: 255. Privileged Execution Operation Stderr:
      
      Stdout: main : command provided 3
      main : run as user is ebadger
      main : requested yarn user is ebadger
      failed to unlink /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001/jobSubmitDir/job.split: Permission denied
      failed to unlink /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001/jobSubmitDir/job.splitmetainfo: Permission denied
      failed to rmdir jobSubmitDir: Directory not empty
      Error while deleting /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001: 39 (Directory not empty)
      
      Full command array for failed execution:
      [/hadoop-3.2.0-SNAPSHOT/bin/container-executor, ebadger, ebadger, 3, /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001]
      2018-05-21 19:46:15,124 ERROR [DeletionService #0] nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(848)) - DeleteAsUser for /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001 returned with exit code: 255
      org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=255:
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:206)
              at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:844)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.FileDeletionTask.run(FileDeletionTask.java:135)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: ExitCodeException exitCode=255:
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
              at org.apache.hadoop.util.Shell.run(Shell.java:902)
              at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
              ... 10 more
      
      [foo@bar hadoop]$ ls -l /tmp/hadoop-local3/usercache/ebadger/appcache/application_1526931492976_0007/container_1526931492976_0007_01_000001/
      total 4
      drwxr-sr-x 2 root users 4096 May 21 19:45 jobSubmitDir
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ebadger Eric Badger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: