Details
-
Wish
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
Prior to [dask#6023|https://github.com/dask/dask/pull/6023], Dask has been using the `write_to_dataset` API to write partitioned parquet datasets. This PR is switching to a (hopefully temporary) custom solution, because that API makes it difficult to populate the the "file_path" column-chunk metadata fields that are returned within the optional `metadata_collector` kwarg. Dask needs to set these fields correctly in order to generate a proper global `"_metadata"` file.
Possible solutions to this problem:
- Optionally populate the file-path fields within `write_to_dataset`
- Always populate the file-path fields within `write_to_dataset`
- Return the file paths for the data written within `write_to_dataset` (up to the user to manually populate the file-path fields)
Attachments
Issue Links
- links to