Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.13.0
Description
In version 0.13.0, the Table.from_pandas function modifies the input schema by making all non-nullable types nullable.
This can cause problems for example with this code:
df = pd.DataFrame(list(range(200)), columns=['numcol']) schema = pa.schema([ pa.field('numcol', pa.int64(), nullable=False), ]) writer = pq.ParquetWriter(io.BytesIO(), schema, version='2.0') table = pa.Table.from_pandas(df, schema=schema) writer.write_table(table)
Which fails due to the writer schema and the table schema being different.
I believe the direct cause could be https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L622 where nullable is set to True by default, resulting in the table schema being modified.
Thanks for your valuable work on this library.
Giacomo
Attachments
Issue Links
- links to