Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.3.1
-
None
Description
is it possible to make `PENDING_DIR_NAME` configurable?
That will enable concurrent writes to same location. current if two spark processes write same destination one of them is failing.
current
public static final String PENDING_DIR_NAME = "_temporary";
new:
PENDING_DIR_NAME = conf.get("mapreduce.fileoutputcommitter.pending.dir", "_temporary");
here is custom commiter doing it: https://gist.github.com/ismailsimsek/33c55d8e1fcfc79160483c38a978edbd
Attachments
Issue Links
- duplicates
-
MAPREDUCE-7331 Make temporary directory used by FileOutputCommitter configurable
- Open
- is superceded by
-
MAPREDUCE-7341 Add a task-manifest output committer for Azure and GCS
- Resolved
- links to