Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.14.0
Description
Timestamps without timezone which are written by pyarrow 0.14.0 cannot be read anymore as timestamps by earlier versions. The timestamp is read as an integer when reading in with pyarrow 0.13.0
Looking at the parquet schemas, it seems that the logical type cannot be understood by the older versions, see below.
File generation with pyarrow 0.14.0
import datetime import pyarrow.parquet as pq import pandas as pd df = pd.DataFrame( { "datetime64": pd.Series(["2018-01-01"], dtype="datetime64[ns]"), "datetime64_ts": pd.Series( [pd.Timestamp(datetime.datetime(2018, 1, 1), tz="Europe/Berlin")], dtype="datetime64[ns]", ), } ) pq.write_table(pa.Table.from_pandas(df), "timezones_pyarrow_14.paquet")
Reading with pyarrow 0.13.0
In [1]: import pyarrow.parquet as pq In [2]: import pyarrow as pa In [3]: with open("timezones_pyarrow_14.paquet", "rb") as fd: ...: table = pq.read_pandas(fd) ...: In [4]: table.to_pandas() Out[4]: datetime64 datetime64_ts 0 1514764800000000 2018-01-01 00:00:00+01:00 In [5]: table.to_pandas().dtypes Out[5]: datetime64 int64 datetime64_ts datetime64[ns, Europe/Berlin] dtype: object
Parquet schema as seen by pyarrow versions:
pyarrow 0.13.0 parquet schema
datetime64: INT64 datetime64_ts: INT64 TIMESTAMP_MICROS
pyarrow 0.14.0 parquet schema
datetime64: INT64 Timestamp(isAdjustedToUTC=false, timeUnit=microseconds) datetime64_ts: INT64 Timestamp(isAdjustedToUTC=true, timeUnit=microseconds)
Attachments
Attachments
Issue Links
- links to