Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14973

OSS filesystem does not relocate many dependencies

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.9.1
    • 1.10.0
    • FileSystems
    • None

    Description

      Whereas the Azure and S3 Hadoop filesystem jars relocate all of their depdendencies:

      $ jar tf opt/flink-azure-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
      org/
      org/apache
      
      $ jar tf opt/flink-s3-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
      org/
      org/apache
      

      The OSS Hadoop filesystem leaves many things un-relocated:

      $ jar tf opt/flink-oss-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
      assets/
      assets/org
      avro/
      avro/shaded
      com/
      com/ctc
      com/fasterxml
      com/google
      com/jcraft
      com/nimbusds
      com/sun
      com/thoughtworks
      javax/
      javax/activation
      javax/el
      javax/servlet
      javax/ws
      javax/xml
      jersey/
      jersey/repackaged
      licenses/
      licenses/LICENSE.asm
      licenses/LICENSE.cddlv1.0
      licenses/LICENSE.cddlv1.1
      licenses/LICENSE.jdom
      licenses/LICENSE.jzlib
      licenses/LICENSE.paranamer
      licenses/LICENSE.protobuf
      licenses/LICENSE.re2j
      licenses/LICENSE.stax2api
      net/
      net/jcip
      net/minidev
      org/
      org/apache
      org/codehaus
      org/eclipse
      org/jdom
      org/objectweb
      org/tukaani
      org/xerial
      

      The first symptom of this I ran into was that Flink is unable to restore from a savepoint if both the OSS and Azure Hadoop filesystems are on the classpath, but I assume this has the potential to cause further problems, at least until more progress is made on the module/classloading front.

      Steps to reproduce

      1. Copy both the Azure and OSS Hadoop filesystem JARs from opt/ into lib/
      2. Run a job that restores from a savepoint (the savepoint might need to be stored on OSS)
      3. See a crash and traceback like:
      2019-11-26 15:59:25,318 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error occurred in the cluster entrypoint.
      org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take leadership with session id 00000000-0000-0000-0000-000000000000.
      	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$30(Dispatcher.java:915) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232]
      	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232]
      	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232]
      	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_232]
      	at org.apache.flink.runtime.concurrent.FutureUtils$WaitingConjunctFuture.handleCompletedFuture(FutureUtils.java:691) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232]
      	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232]
      	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232]
      	at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575) ~[?:1.8.0_232]
      	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:753) ~[?:1.8.0_232]
      	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) ~[?:1.8.0_232]
      	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.actor.Actor.aroundReceive(Actor.scala:517) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.actor.Actor.aroundReceive$(Actor.scala:515) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
      	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_232]
      	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	... 4 more
      Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
      	at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:152) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_232]
      	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	... 4 more
      Caused by: java.lang.NoSuchMethodError: org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
      	at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.open(AliyunOSSFileSystem.java:570) ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950) ~[flink-azure-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:120) ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:37) ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:141) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1132) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.scheduler.LegacyScheduler.tryRestoreExecutionGraphFromSavepoint(LegacyScheduler.java:237) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.scheduler.LegacyScheduler.createAndRestoreExecutionGraph(LegacyScheduler.java:196) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.scheduler.LegacyScheduler.<init>(LegacyScheduler.java:176) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.scheduler.LegacySchedulerFactory.createInstance(LegacySchedulerFactory.java:70) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:275) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:265) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:146) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_232]
      	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
      	... 4 more
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              plucas Patrick Lucas
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: