Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
0.14.1
Description
When initialising an array with NaT only values the row group statistic is corrupt returning either random values or raises integer out of bound exceptions.
import io import pandas as pd import pyarrow as pa import pyarrow.parquet as pq df = pd.DataFrame({"t": pd.Series([pd.NaT], dtype="datetime64[ns]")}) buf = pa.BufferOutputStream() pq.write_table(pa.Table.from_pandas(df), buf, version="2.0") buf = io.BytesIO(buf.getvalue().to_pybytes()) parquet_file = pq.ParquetFile(buf) # Asserting behaviour is difficult since it is random and the state is ill defined. # After a few iterations an exception is raised. while True: parquet_file.metadata.row_group(0).column(0).statistics.max