Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0
Description
If I read a parquet file (see attachment) with timestamps generated in Spark and apply a filter on a date column I get segmentation fault
import pyarrow.parquet as pq now = datetime.datetime.now() table = pq.read_table("timestamp.parquet", filters=[("date", "<=", now)])
The attached parquet file is generated with this code in spark:
now = datetime.datetime.now() data = {"date": [ now - datetime.timedelta(days=i) for i in range(100)]} schema = { "type": "struct", "fields": [{"name": "date", "type": "timestamp", "nullable": True, "metadata": {}}, ], } spf = spark.createDataFrame(pd.DataFrame(data), schema=StructType.fromJson(schema)) spf.write.format("parquet").mode("overwrite").save("timestamp.parquet")
If I downgrade pyarrow to 2.0.0 it works fine.
Python version 3.7.7
pyarrow version 3.0.0
Attachments
Attachments
Issue Links
- duplicates
-
ARROW-11538 [Python] Segfault reading Parquet dataset with Timestamp filter
- Resolved
- links to