[BEAM-2712] SerializablePipelineOptions should not call FileSystems.setDefaultPipelineOptions. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: P3
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: runner-apex, runner-core, runner-flink, runner-spark
Labels:
None

Description

https://github.com/apache/beam/pull/3654 introduces SerializablePipelineOptions, which on deserialization calls FileSystems.setDefaultPipelineOptions.

This is obviously problematic and racy in case the same process uses SerializablePipelineOptions with different filesystem-related options in them.

The reason the PR does this is, Flink and Apex runners were already doing it in their respective SerializablePipelineOptions-like classes (being removed in the PR); and Spark wasn't but probably should have.

I believe this is done for the sake of having the proper filesystem options automatically available on workers in all places where any kind of PipelineOptions are used. Instead, all 3 runners should pick a better place to initialize their workers, and explicitly call FileSystems.setDefaultPipelineOptions there.

It would be even better if FileSystems.setDefaultPipelineOptions didn't exist at all, but that's a topic for a separate JIRA.

CC'ing runner contributors aljoscha aviemzur thw

Attachments

Issue Links

is related to

BEAM-8577 FileSystems may have not be initialized during ResourceId deserialization

Triage Needed

Activity

People

Assignee:: Unassigned

Reporter:: Eugene Kirpichov

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 01/Aug/17 23:51

Updated:: 03/Jun/22 18:35