Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.5.0
-
None
Description
sibling of MAPREDUCE-7403: allow PathOutputCommitter implementation to declare that they support the semantics required by spark dynamic partitioning:
- rename to work as expected
- working dir to be on same fs as final dir
They will do this through implementing StreamCapabilities and adding a new probe, "mapreduce.job.committer.dynamic.partitioning" ; the spark side changes are to
- postpone rejection of dynamic partition overwrite until the output committer is created
- allow it if the committer implements StreamCapabilities and returns true for {{hasCapability("mapreduce.job.committer.dynamic.partitioning")))
this isn't going to be supported by the s3a committers, they don't meet the requirements. The manifest committer of MAPREDUCE-7341 running against abfs and gcs does work.
Attachments
Issue Links
- is required by
-
SPARK-41551 Improve/complete PathOutputCommitProtocol support for dynamic partitioning
- In Progress
- relates to
-
MAPREDUCE-7403 Support spark dynamic partitioning in the Manifest Committer
- Resolved
-
MAPREDUCE-7341 Add a task-manifest output committer for Azure and GCS
- Resolved
- links to