Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Won't Do
-
None
-
None
Description
Currently we allow the user to specify a basename_template, and this can include a "{i}" part to replace it with an automatically incremented integer (so each generated file written to a single partition is unique):
https://github.com/apache/arrow/blob/master/python/pyarrow/dataset.py#L713-L717
It might be useful to also have the ability to use a UUID, to ensure the file is unique in general (not only for a single write) and to mimic the behaviour of the old write_to_dataset implementation.
For example, we could look for a "{uuid}" in the template string, and if present replace it for each file with a new UUID.
Attachments
Issue Links
- is duplicated by
-
ARROW-14010 [C++][Python] No way to generate UUID filenames with new datasets API
- Closed
- relates to
-
ARROW-12358 [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset
- Open
-
ARROW-12365 [Python] [Dataset] Add partition_filename_cb to ds.write_dataset()
- Closed