Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10248

[C++][Dataset] Dataset writing does not write schema metadata

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: C++

      Description

      Not sure if this is related to the writing refactor that landed yesterday, but `write_dataset` does not preserve the schema metadata (eg used for pandas metadata):

      In [20]: df = pd.DataFrame({'a': [1, 2, 3]})
      
      In [21]: table = pa.Table.from_pandas(df)
      
      In [22]: table.schema
      Out[22]: 
      a: int64
      -- schema metadata --
      pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 396
      
      In [23]: ds.write_dataset(table, "test_write_dataset_pandas", format="parquet")
      
      In [24]: pq.read_table("test_write_dataset_pandas/part-0.parquet").schema
      Out[24]: 
      a: int64
        -- field metadata --
        PARQUET:field_id: '1'
      

      I tagged it for 2.0.0 for a moment in case it's possible today, but I didn't yet look into how easy it would be to fix.

      cc Ben Kietzman

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bkietz Ben Kietzman
                Reporter:
                jorisvandenbossche Joris Van den Bossche
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m