Description
Date and timestamp are not yet supported in DataFrame.toPandas() using ArrowConverters. These are common types for data analysis used in both Spark and Pandas and should be supported.
There is a discrepancy with the way that PySpark and Arrow store timestamps, without timezone specified, internally. PySpark takes a UTC timestamp that is adjusted to local time and Arrow is in UTC time. Hopefully there is a clean way to resolve this.
Spark internal storage spec:
- DateType stored as days
- Timestamp stored as microseconds
Attachments
Issue Links
- is related to
-
SPARK-21722 Enable timezone-aware timestamp type when creating Pandas DataFrame.
- Resolved
- links to