Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4359

[Python] Column metadata is not saved or loaded in parquet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Python

    Description

      Hi all,

      a while ago I posted this issue: ARROW-3866

      While working with Pyarrow I encountered another potential bug related to column metadata: If I create a table containing columns with metadata everything is fine. But after I save the table to parquet and load it back as a table using pq.read_table, the column metadata is gone.

       
      As of now I can not say yet whether the metadata is not saved correctly or not loaded correctly, as I have no idea how to verify it. Unfortunately I also don't have the time try a lot, but I wanted to let you know anyway.

       

      field0 = pa.field('field1', pa.int64(), metadata=dict(a="A", b="B"))
      field1 = pa.field('field2', pa.int64(), nullable=False)
      columns = [
          pa.column(field0, pa.array([1, 2])),
          pa.column(field1, pa.array([3, 4]))
      ]
      table = pa.Table.from_arrays(columns)
      
      pq.write_table(tab, path)
      
      tab2 = pq.read_table(path)
      tab2.column(0).field.metadata
      

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              frutti93 Seb Fru
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: