Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-11402

User code can fail with an UnsatisfiedLinkError in the presence of multiple classloaders

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.7.0
    • Fix Version/s: None
    • Labels:

      Description

      As reported on the user mailing list thread "`env.java.opts` not persisting after job canceled or failed and then restarted", there can be issues with using native libraries and user code class loading.

      Steps to reproduce

      I was able to reproduce the issue reported on the mailing list using snappy-java in a user program. Running the attached user program works fine on initial submission, but results in a failure when re-executed.

      I'm using Flink 1.7.0 using a standalone cluster started via bin/start-cluster.sh.

      0. Unpack attached Maven project and build using mvn clean package or directly use attached hello-snappy-1.0-SNAPSHOT.jar
      1. Download snappy-java-1.1.7.2.jar and unpack libsnappyjava for your system:

      jar tf snappy-java-1.1.7.2.jar | grep libsnappy
      ...
      org/xerial/snappy/native/Linux/x86_64/libsnappyjava.so
      ...
      org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib
      ...
      

      2. Configure system library path to libsnappyjava in flink-conf.yaml (path needs to be adjusted for your system):

      env.java.opts: -Djava.library.path=/.../org/xerial/snappy/native/Mac/x86_64
      

      3. Run attached hello-snappy-1.0-SNAPSHOT.jar

      bin/flink run hello-snappy-1.0-SNAPSHOT.jar
      Starting execution of program
      Program execution finished
      Job with JobID ae815b918dd7bc64ac8959e4e224f2b4 has finished.
      Job Runtime: 359 ms
      

      4. Rerun attached hello-snappy-1.0-SNAPSHOT.jar

      bin/flink run hello-snappy-1.0-SNAPSHOT.jar
      Starting execution of program
      
      ------------------------------------------------------------
       The program finished with the following exception:
      
      org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 7d69baca58f33180cb9251449ddcd396)
        at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487)
        at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
        at com.github.uce.HelloSnappy.main(HelloSnappy.java:18)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
        at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
        at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
        at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
        at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
        at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
        at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
      Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
        at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
        at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265)
        ... 17 more
      Caused by: java.lang.UnsatisfiedLinkError: Native Library /.../org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib already loaded in another classloader
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1907)
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1861)
        at java.lang.Runtime.loadLibrary0(Runtime.java:870)
        at java.lang.System.loadLibrary(System.java:1122)
        at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:182)
        at org.xerial.snappy.SnappyLoader.loadSnappyApi(SnappyLoader.java:154)
        at org.xerial.snappy.Snappy.<clinit>(Snappy.java:47)
        at com.github.uce.HelloSnappy.lambda$main$95f17bfa$1(HelloSnappy.java:13)
        at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
        at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
        at org.apache.flink.streaming.api.functions.source.FromElementsFunction.run(FromElementsFunction.java:164)
        at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:94)
        at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
        at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
        at java.lang.Thread.run(Thread.java:748)
      

      Note: The attached user code configures Snappy to use libsnappyjava in the path specified by java.library.path (see org-xerial-snappy.properties). When bundling the native code in the user JAR, repeated execution works fine.

        Attachments

        1. hello-snappy-1.0-SNAPSHOT.jar
          1.93 MB
          Ufuk Celebi
        2. hello-snappy.tgz
          2 kB
          Ufuk Celebi

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                uce Ufuk Celebi
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated: