Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Not A Problem
-
5.0.0
-
None
-
None
Description
In Arrow 4.0.0 it is possible to round-trip the TimeZone property of List<Timestamp> columns to and from parquet files:
>>> import pyarrow as pa >>> import pyarrow.parquet as pq >>> import datetime >>> column = pa.array([[datetime.datetime(2023, 9, 23, 11)]], pa.list_(pa.timestamp('us', 'America/New_York'))); >>> t = pa.Table.from_arrays([column], names=['TimestampColumn']); >>> pq.write_table(t, "example.parq"); >>> t2 = pq.read_table("example.parq"); >>> t2 pyarrow.Table Dates: list<item: timestamp[us, tz=America/New_York]> child 0, item: timestamp[us, tz=America/New_York]
However, if you read the same parquet file in pyarrow 5.0.0, the TimeZone is set to UTC:
>>> t3 = pq.read_table("example.parq");
>>> t3
pyarrow.Table
Dates: list<item: timestamp[us, tz=UTC]>
child 0, item: timestamp[us, tz=UTC]
I noticed that the TimeZone is preserved in Arrow 5.0 when reading non-nested timestamp columns.