Say I have a pandas DataFrame df that I would like to store on disk as dataset using pyarrow parquet, I would do this:
On disk the dataset would look like something like this:
Wished Feature: It'd be great if I can override the auto-assignment of the long UUID as filename somehow during the dataset writing. My purpose is to be able to overwrite the dataset on disk when I have a new version of df. Currently if I try to write the dataset again, another new uniquely named [UUID].parquet file will be placed next to the old one, with the same, redundant data.