Details
-
Improvement
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
2.32.0
-
None
Description
I am new to this codebase so apologies if I have any misunderstandings, but from what I can tell when SparkExecutableStageFunction is called an ArtifactRetrievalService is created (if the job bundle factory's environment cache is cold) to be called by the worker harness.
The issue is that FileSystems.setDefaultPipelineOptions is not called before this, so no filesystems are registered. If one is using cloud storage such as S3 to stage artifacts, then the ArtifactRetrievalService will not be able to retrieve the artifacts and throw an exception:
java.lang.IllegalArgumentException: No filesystem found for scheme s3
This doesn't affect other runners such as the Flink runner because it calls FileSystems.setDefaultPipelineOptions in its executable stage function
Attachments
Issue Links
- links to