[ARROW-5878] [Python][C++] Parquet reader not forward compatible for timestamps without timezone - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.14.0
Fix Version/s: 0.14.1, 0.15.0
Component/s: C++, Python
Labels:
- parquet
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/22293

Description

Timestamps without timezone which are written by pyarrow 0.14.0 cannot be read anymore as timestamps by earlier versions. The timestamp is read as an integer when reading in with pyarrow 0.13.0

Looking at the parquet schemas, it seems that the logical type cannot be understood by the older versions, see below.

File generation with pyarrow 0.14.0

import datetime
import pyarrow.parquet as pq
import pandas as pd

df = pd.DataFrame(
    {
        "datetime64": pd.Series(["2018-01-01"], dtype="datetime64[ns]"),
        "datetime64_ts": pd.Series(
            [pd.Timestamp(datetime.datetime(2018, 1, 1), tz="Europe/Berlin")],
            dtype="datetime64[ns]",
        ),
    }
)
pq.write_table(pa.Table.from_pandas(df), "timezones_pyarrow_14.paquet")

Reading with pyarrow 0.13.0

In [1]: import pyarrow.parquet as pq

In [2]: import pyarrow as pa

In [3]: with open("timezones_pyarrow_14.paquet", "rb") as fd:
   ...:     table = pq.read_pandas(fd)
   ...:

In [4]: table.to_pandas()
Out[4]:
         datetime64             datetime64_ts
0  1514764800000000 2018-01-01 00:00:00+01:00

In [5]: table.to_pandas().dtypes
Out[5]:
datetime64                               int64
datetime64_ts    datetime64[ns, Europe/Berlin]
dtype: object

Parquet schema as seen by pyarrow versions:

pyarrow 0.13.0 parquet schema

datetime64: INT64
datetime64_ts: INT64 TIMESTAMP_MICROS

pyarrow 0.14.0 parquet schema

datetime64: INT64 Timestamp(isAdjustedToUTC=false, timeUnit=microseconds)
datetime64_ts: INT64 Timestamp(isAdjustedToUTC=true, timeUnit=microseconds)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

timezones_pyarrow_14.paquet
08/Jul/19 15:08
1 kB
Florian Jetter

Issue Links

links to

GitHub Pull Request #4825

Activity

People

Assignee:: Ben Kietzman

Reporter:: Florian Jetter

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Jul/19 15:17

Updated:: 11/Jan/23 07:43

Resolved:: 12/Jul/19 17:47

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

5h 20m