Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9215

pyarrow parquet writer converts uint32 columns to int64

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • None
    • None

    Description

      pyarrow parquet writer changes uint32 columns to int64. This change is not made for other types and uint8, uint16, and uint64 columns retain their type.

      In [1]: import pandas as pd
      
      In [2]: import pyarrow as pa
      
      In [3]: import pyarrow.parquet as pq
      
      In [5]: df = pd.DataFrame({'a':pd.Series([1,2,3], dtype='uint32')})
      
      In [6]: padf = pa.Table.from_pandas(df)
      
      In [7]: padf
      Out[7]: 
      pyarrow.Table
      a: uint32
      
      In [8]: pq.write_table(padf, 'pa.parquet')
      
      In [9]: pq.read_table('pa.parquet')
      Out[9]: 
      pyarrow.Table
      a: int64
      

      Attachments

        Activity

          People

            uwe Uwe Korn
            devavret Devavret Makkar
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: