Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8312

Flink portable pipeline jars do not need to stage artifacts remotely

Details

    Description

      Currently, Flink job jars re-stage all artifacts at runtime (on the JobManager) by using the usual BeamFileSystemArtifactRetrievalService [1]. However, since the manifest and all the artifacts live on the classpath of the jar, and everything from the classpath is copied to the Flink workers anyway [2], it should not be necessary to do additional artifact staging. We could replace BeamFileSystemArtifactRetrievalService in this case with a simple ArtifactRetrievalService that just pulls the artifacts from the classpath.

       

       [1] https://github.com/apache/beam/blob/340c3202b1e5824b959f5f9f626e4c7c7842a3cb/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java

      [2] https://github.com/apache/beam/blob/2f1b56ccc506054e40afe4793a8b556e872e1865/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java#L93

      Attachments

        Activity

          People

            ibzib Kyle Weaver
            ibzib Kyle Weaver
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 20m
                4h 20m