Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-2712

SerializablePipelineOptions should not call FileSystems.setDefaultPipelineOptions.

    XMLWordPrintableJSON

    Details

      Description

      https://github.com/apache/beam/pull/3654 introduces SerializablePipelineOptions, which on deserialization calls FileSystems.setDefaultPipelineOptions.

      This is obviously problematic and racy in case the same process uses SerializablePipelineOptions with different filesystem-related options in them.

      The reason the PR does this is, Flink and Apex runners were already doing it in their respective SerializablePipelineOptions-like classes (being removed in the PR); and Spark wasn't but probably should have.

      I believe this is done for the sake of having the proper filesystem options automatically available on workers in all places where any kind of PipelineOptions are used. Instead, all 3 runners should pick a better place to initialize their workers, and explicitly call FileSystems.setDefaultPipelineOptions there.

      It would be even better if FileSystems.setDefaultPipelineOptions didn't exist at all, but that's a topic for a separate JIRA.

      CC'ing runner contributors Aljoscha Krettek Aviem Zur Thomas Weise

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jkff Eugene Kirpichov
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: