Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8312

Flink portable pipeline jars do not need to stage artifacts remotely

    XMLWordPrintableJSON

    Details

      Description

      Currently, Flink job jars re-stage all artifacts at runtime (on the JobManager) by using the usual BeamFileSystemArtifactRetrievalService [1]. However, since the manifest and all the artifacts live on the classpath of the jar, and everything from the classpath is copied to the Flink workers anyway [2], it should not be necessary to do additional artifact staging. We could replace BeamFileSystemArtifactRetrievalService in this case with a simple ArtifactRetrievalService that just pulls the artifacts from the classpath.

       

       [1] https://github.com/apache/beam/blob/340c3202b1e5824b959f5f9f626e4c7c7842a3cb/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java

      [2] https://github.com/apache/beam/blob/2f1b56ccc506054e40afe4793a8b556e872e1865/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java#L93

        Attachments

          Activity

            People

            • Assignee:
              ibzib Kyle Weaver
              Reporter:
              ibzib Kyle Weaver
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 20m
                4h 20m