Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
7.0.0
-
None
-
macOS 12.2.1 aarch64
python. 3.10.1
arrow 7.0.0
Description
I have a parquet file with two columns (int64 and double) and 9 million rows. The parquet tools (parquet, parquet-reader, parquet-schema...) read it perfectly. (I have many files, actually, but they all exhibit the same behavior).
The following code fails with "zsh bus error ipython":
import pyarrow.parquet as pq
pq.read_table("parquet_file")
These snippets work properly.
pq.read_table("parquet_file", use_lagacy_dataset=True)
f = pq.ParquetFile("parquet_file")
f.read()
for batch in f.iterbatches():
print(len(batch))