Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-3652

Oozie launcher should retry directory listing when NoSuchFileException occurs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 5.3.0
    • None
    • None

    Description

      Oozie launcher lists the CWD and prints the file name and directory name. When there is a java agent attach to Oozie launcher JVM, a temporary attach_pid<pid> file is created in the CWD by default. If the temporary file is deleted between directory listing and printing, NoSuchFileException occurs.

      2022-01-13 06:11:05,569  WARN Hive2ActionExecutor:523 - SERVER[<host>] USER[admin] GROUP[-] TOKEN[] APP[Hive Sample] JOB[0000061-220113033638267-oozie-oozi-W] ACTION[0000061-220113033638267-oozie-oozi-W@hive-dbea] Launcher exception: ./.attach_pid1517
      java.nio.file.NoSuchFileException: ./.attach_pid1517
          at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
          at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
          at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
          at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
          at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
          at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
          at java.nio.file.Files.readAttributes(Files.java:1737)
          at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:225)
          at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
          at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372)
          at java.nio.file.Files.walkFileTree(Files.java:2706)
          at org.apache.oozie.action.hadoop.LocalFsOperations.printContentsOfDir(LocalFsOperations.java:59)
          at org.apache.oozie.action.hadoop.LauncherAM.printDebugInfo(LauncherAM.java:293)
          at org.apache.oozie.action.hadoop.LauncherAM.access$100(LauncherAM.java:55)
          at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:221)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:422)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
          at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:217)
          at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:153)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:422)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
          at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:141)
      

      After the log4j hotpatch provided by Correto enabled, we faced this issue intermittently. Disabling the hotpatch will fix the issue, however, there maybe another agent and it would use the CWD by default, so I suppose it can be fixed in Oozie side.

      Attachments

        1. OOZIE-3652-001.patch
          4 kB
          Akira Ajisaka
        2. OOZIE-3652-002.patch
          5 kB
          Akira Ajisaka
        3. OOZIE-3652-003.patch
          5 kB
          Akira Ajisaka
        4. OOZIE-3652-004.patch
          5 kB
          Akira Ajisaka

        Activity

          People

            aajisaka Akira Ajisaka
            aajisaka Akira Ajisaka
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: