Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8403

Nodemanager logs failed to download file with INFO level

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2.0, 3.1.2
    • Component/s: yarn
    • Labels:
      None

      Description

      Some of the container execution related stack traces are printing in INFO or WARN level.

      2018-06-06 03:10:40,077 INFO  localizer.ResourceLocalizationService (ResourceLocalizationService.java:writeCredentials(1312)) - Writing credentials to the nmPrivate file /grid/0/hadoop/yarn/local/nmPrivate/container_e02_1528246317583_0048_01_000001.tokens
      2018-06-06 03:10:40,087 INFO  localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(975)) - Failed to download resource { { hdfs://mycluster.example.com:8020/user/hrt_qa/Streaming/InputDir, 1528254452720, FILE, null },pending,[(container_e02_1528246317583_0048_01_000001)],6074418082915225,DOWNLOADING}
      org.apache.hadoop.yarn.exceptions.YarnException: Download and unpack failed
              at org.apache.hadoop.yarn.util.FSDownload.downloadAndUnpack(FSDownload.java:306)
              at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:283)
              at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:409)
              at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:66)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.io.FileNotFoundException: /grid/0/hadoop/yarn/local/filecache/28_tmp/InputDir/input1.txt (Permission denied)
              at java.io.FileOutputStream.open0(Native Method)
              at java.io.FileOutputStream.open(FileOutputStream.java:270)
              at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
              at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:236)
              at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:219)
              at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:318)
              at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:307)
              at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:338)
              at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:401)
              at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:464)
              at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
              at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
              at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
              at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
              at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:408)
              at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:399)
              at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:381)
              at org.apache.hadoop.yarn.util.FSDownload.downloadAndUnpack(FSDownload.java:298)
              ... 9 more
      
      2018-06-06 03:10:41,547 WARN  privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(182)) - IOException executing command:
      java.io.InterruptedIOException: java.lang.InterruptedException
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:1012)
              at org.apache.hadoop.util.Shell.run(Shell.java:902)
              at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
              at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229)
      Caused by: java.lang.InterruptedException
              at java.lang.Object.wait(Native Method)
              at java.lang.Object.wait(Object.java:502)
              at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395)
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:1002)
              ... 5 more
      2018-06-06 03:10:41,548 WARN  nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:startLocalizer(407)) - Exit code from container container_e02_1528246317583_0048_01_000001 startLocalizer is : -1
      org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: java.io.InterruptedIOException: java.lang.InterruptedException
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:183)
              at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229)
      Caused by: java.io.InterruptedIOException: java.lang.InterruptedException
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:1012)
              at org.apache.hadoop.util.Shell.run(Shell.java:902)
              at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
              ... 2 more
      Caused by: java.lang.InterruptedException
              at java.lang.Object.wait(Native Method)
              at java.lang.Object.wait(Object.java:502)
              at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395)
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:1002)
              ... 5 more
      2018-06-06 03:10:41,548 INFO  localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(1249)) - Localizer failed for container_e02_1528246317583_0048_01_000001
      java.io.IOException: Application application_1528246317583_0048 initialization failed (exitCode=-1) with output: null
              at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:411)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229)
      Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: java.io.InterruptedIOException: java.lang.InterruptedException
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:183)
              at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402)
      ... 1 more
      Caused by: java.io.InterruptedIOException: java.lang.InterruptedException
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:1012)
              at org.apache.hadoop.util.Shell.run(Shell.java:902)
              at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
              ... 2 more
      Caused by: java.lang.InterruptedException
              at java.lang.Object.wait(Native Method)
              at java.lang.Object.wait(Object.java:502)
              at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395)
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:1002)
              ... 5 more
      

      These logs are only present in NM. ( It does not show up in AM log)
      These stacktraces are in WARN or INFO level. Ideally, exception should be printed in ERROR log level.

        Attachments

        1. YARN-8403.003.patch
          4 kB
          Eric Yang
        2. YARN-8403.png
          243 kB
          Eric Yang
        3. YARN-8403.002.patch
          2 kB
          Eric Yang
        4. YARN-8403.001.patch
          2 kB
          Eric Yang

          Issue Links

            Activity

              People

              • Assignee:
                eyang Eric Yang
                Reporter:
                eyang Eric Yang
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: