Description
At the moment PySpark doesn't support interval types at all. For example calling the following
spark.sql("SELECT current_date() - current_date()")
or
from pyspark.sql.functions import current_timestamp spark.range(1).select(current_timestamp() - current_timestamp())
results in
Traceback (most recent call last): ... ValueError: Could not parse datatype: interval
At minimum, we should support CalendarIntervalType in the schema, so queries using it don't fail on conversion.
Optionally we should have to provide conversions between internal and external types. That however, might be tricky, as CalendarInterval seems to have different semantics than datetime.timedetla.
Attachments
Issue Links
- is related to
-
SPARK-30546 Make interval type more future-proof
- Resolved
-
SPARK-28494 Expose CalendarIntervalType and CalendarInterval in Spark
- Open
1.
|
Add support for CalendarIntervalType in PySpark schema | In Progress | Unassigned | |
2.
|
Add mapping between external Python types and CalendarIntervalType | Open | Unassigned |