Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Won't Fix
-
0.12.1
-
None
-
PyArrow: 0.12.1
Python: 2.7.15, 3.7.2
Pandas: 0.24.2
Description
Run the following code:
import datetime as dt import pandas as pd import pyarrow as pa import pyarrow.parquet as pq dataframe = pd.DataFrame({'foo': [dt.datetime.now()]}) table = pa.Table.from_pandas(dataframe, preserve_index=False) pq.write_table(table, 'int64.parq') pq.write_table(table, 'int96.parq', use_deprecated_int96_timestamps=True)
Examining the int64.parq file, we see that the column metadata includes an object type of TIMESTAMP_MICROS and also gives some stats. All is well.
file schema: schema -------------------------------------------------------------------------------- foo: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 row group 1: RC:1 TS:76 OFFSET:4 -------------------------------------------------------------------------------- foo: INT64 SNAPPY ... ST:[min: 2019-12-31T23:59:59.999000, max: 2019-12-31T23:59:59.999000, num_nulls: 0]
However, if we look at int96.parq, it appears that that metadata is lost. No object type, and no column stats.
file schema: schema -------------------------------------------------------------------------------- foo: OPTIONAL INT96 R:0 D:1 row group 1: RC:1 TS:58 OFFSET:4 -------------------------------------------------------------------------------- foo: INT96 SNAPPY ... ST:[no stats for this column]
This is a bit confusing since the metadata for the exact same data can look differently depending on an unrelated flag being set or cleared.