Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-35358

Breaking change when loading artifacts

    XMLWordPrintableJSON

Details

    Description

      We have been using the following code snippet in our Dockerfiles for running a Flink job in application mode

       

      FROM flink:1.18.1-scala_2.12-java17
      
      COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/artifacts/my-job.jar
      
      USER flink 

       

      Which has been working since at least around Flink 1.14, but the 1.19 update has broken our Dockerfiles. The fix is to put the jar file a step further out so the code snippet becomes

       

      FROM flink:1.18.1-scala_2.12-java17
      
      COPY --from=build /app/target/my-job*.jar /opt/flink/usrlib/my-job.jar
      
      USER flink  

       

      We have not spent too much time looking into what the cause is, but we get the stack trace

       

      myjob-jobmanager-1   | org.apache.flink.util.FlinkException: Could not load the provided entrypoint class.
      myjob-jobmanager-1   |     at org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:230) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:149) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.lambda$main$0(StandaloneApplicationClusterEntryPoint.java:90) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:89) [flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   | Caused by: org.apache.flink.client.program.ProgramInvocationException: The program's entry point class 'my.company.job.MyJob' was not found in the jar file.
      myjob-jobmanager-1   |     at org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:481) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     ... 4 more
      myjob-jobmanager-1   | Caused by: java.lang.ClassNotFoundException: my.company.job.MyJob
      myjob-jobmanager-1   |     at java.net.URLClassLoader.findClass(Unknown Source) ~[?:?]
      myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) ~[?:?]
      myjob-jobmanager-1   |     at org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at java.lang.ClassLoader.loadClass(Unknown Source) ~[?:?]
      myjob-jobmanager-1   |     at org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:197) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at java.lang.Class.forName0(Native Method) ~[?:?]
      myjob-jobmanager-1   |     at java.lang.Class.forName(Unknown Source) ~[?:?]
      myjob-jobmanager-1   |     at org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:479) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     at org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:228) ~[flink-dist-1.19.0.jar:1.19.0]
      myjob-jobmanager-1   |     ... 4 more

       

      I have changed some text in the stack trace to keep it anonymous so it is possible there is a typo but that is not the issue. As you can see, the stack trace leads to PackagedProgram and DefaultPackagedProgramRetriever to which the only commits after Flink 1.18 are PackagedProgram commit and DefaultPackagedProgramRetriever commit and we suspect the culprit is the latter, specifically this line which we think has made the artifact check non-recursive. We assume it is intended to have your artifacts directly in /opt/flink/usrlib without the artifacts directory so we are planning on changing that for our Dockerfiles anyway, but it is still a breaking change so we wanted to make an issue on it first.

      Attachments

        Activity

          People

            ferenc-csaky Ferenc Csaky
            rkth Rasmus Thygesen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: