Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12823

[Parquet][Python] Read and write file/column metadata using pandas attrs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Parquet, Python

    Description

      Related: https://github.com/pandas-dev/pandas/issues/20521

      What the general thoughts are to use DataFrame.attrs and Series.attrs for reading and writing metadata to/from parquet?

      For example, here is how the metadata would be written:

      pdf = pandas.DataFrame({"a": [1]})
      pdf.attrs = {"name": "my custom dataset"}
      pdf.a.attrs = {"long_name": "Description about data", "nodata": -1, "units": "metre"}
      pdf.to_parquet("file.parquet")

      Then, when loading in the data:

      pdf = pandas.read_parquet("file.parquet")
      pdf.attrs
      {"name": "my custom dataset"}
      pdf.a.attrs
      {"long_name": "Description about data", "nodata": -1, "units": "metre"}

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            snowman2 Alan Snow
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: