Details
-
Improvement
-
Status: In Progress
-
Minor
-
Resolution: Unresolved
-
3.3.1
-
None
-
None
Description
Followup to SPARK-40034 as
- that is incomplete as it doesn't record the partitions
- as long at the job doesn't call `newTaskTempFileAbsPath()`, and slow renames are ok, both s3a committers are actually OK to use.
It's only that newTaskTempFileAbsPath operation which is unsupported in s3a committers; the post-job dir rename is O(data) but file by file rename is correct for a non-atomic job commit.
- Cut PathOutputCommitProtocol.newTaskTempFile; to update super partitionPaths (needs a setter). The superclass can't just say if (committer instance of PathOutputCommitter as spark-core needs to compile with older hadoop versions)
- downgrade failure in setup to log (info?)
- retain failure in the newTaskTempFileAbsPath call.
Testing: yes
Attachments
Attachments
Issue Links
- requires
-
SPARK-40034 PathOutputCommitters to work with dynamic partition overwrite
- Resolved
- links to