Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
Related: https://github.com/pandas-dev/pandas/issues/20521
What the general thoughts are to use DataFrame.attrs and Series.attrs for reading and writing metadata to/from parquet?
For example, here is how the metadata would be written:
pdf = pandas.DataFrame({"a": [1]}) pdf.attrs = {"name": "my custom dataset"} pdf.a.attrs = {"long_name": "Description about data", "nodata": -1, "units": "metre"} pdf.to_parquet("file.parquet")
Then, when loading in the data:
pdf = pandas.read_parquet("file.parquet")
pdf.attrs
pdf.a.attrs