Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
-
macOS Catalina, Python 3.7.3, Pyarrow 2.0.0
Description
Sometimes when writing tables that contain List<Struct> columns, the data is written incorrectly. Here is a code sample that produces the error. There are no exceptions raised here, but a simple equality check via equals() yields False for the second test case...
import pyarrow as pa import pyarrow.parquet as pq # Write small amount of data to parquet file, and read it back. In this case, both tables are equal. data1 = [[{'x':'abc','y':'abc'}]]*100 + [[{'x':'abc','y':'gcb'}]]*100 array1 = pa.array(data1) table1 = pa.table([array1],names=['column']) pq.write_table(table1,'temp1.parquet') table1_1 = pq.read_table('temp1.parquet') print(table1_1.equals(table1)) # Write larger amount of data to parquet file, and read it back. In this case, the tables are not equal. data2 = data1*100 array2 = pa.array(data2) table2 = pa.table([array2],names=['column']) pq.write_table(table2,'temp2.parquet') table2_1 = pq.read_table('temp2.parquet') print(table2_1.equals(table2))
Attachments
Issue Links
- duplicates
-
ARROW-10493 [C++][Parquet] Writing nullable nested strings results in wrong data in file
- Resolved
- links to