Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.4.0
-
None
Description
Currently path of metadata of output checkpoint is hardcoded. The metadata is saved in output path in _spark_metadata folder. It's a constraint on structure of paths, that might be easily relaxed by parametrisable path of output metadata. It would help with issues like changing output directory of spark streaming job, two jobs writing to the same output path or partition discovery. It would also help with separation of metadata from data in path structure.
The main target of change is getMetadataLogPath method in FileStreamSink. It has got access to sqlConf, so this method can override the default _spark_metadata path if defined it config. Introduction of parametrised metadata path needs reconsidering of meaning of hasMetadata method in FileStreamSink.
Attachments
Issue Links
- fixes
-
SPARK-30542 Two Spark structured streaming jobs cannot write to same base path
- Resolved
- links to