Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
2.0.0
-
None
-
pandas v1.0.4
Description
When writing a dict column using pyarrow.
import pandas as pd orig = pd.read_parquet("original.parquet") orig.to_parquet("first_write.parquet") first_write = pd.read_parquet("first_write.parquet") print(orig.equals(first_write))
This incorrect results start appearing after index 1024. first_write.parquet was created after reading and then writing it again. I don't see any obvious pattern in the shuffled rows.
Original records
Written records
Attachments
Attachments
Issue Links
- is duplicated by
-
ARROW-10493 [C++][Parquet] Writing nullable nested strings results in wrong data in file
- Resolved