Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-2712

SerializablePipelineOptions should not call FileSystems.setDefaultPipelineOptions.

Details

    Description

      https://github.com/apache/beam/pull/3654 introduces SerializablePipelineOptions, which on deserialization calls FileSystems.setDefaultPipelineOptions.

      This is obviously problematic and racy in case the same process uses SerializablePipelineOptions with different filesystem-related options in them.

      The reason the PR does this is, Flink and Apex runners were already doing it in their respective SerializablePipelineOptions-like classes (being removed in the PR); and Spark wasn't but probably should have.

      I believe this is done for the sake of having the proper filesystem options automatically available on workers in all places where any kind of PipelineOptions are used. Instead, all 3 runners should pick a better place to initialize their workers, and explicitly call FileSystems.setDefaultPipelineOptions there.

      It would be even better if FileSystems.setDefaultPipelineOptions didn't exist at all, but that's a topic for a separate JIRA.

      CC'ing runner contributors aljoscha aviemzur thw

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jkff Eugene Kirpichov
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: