[SPARK-10162] PySpark filters with datetimes mess up when datetimes have timezones. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6.0
Component/s: PySpark
Labels:
None

Description

PySpark appears to ignore timezone information when filtering on (and working in general with) datetimes.

Please see the example below. The generated filter in the query plan is 5 hours off (my computer is EST).

In [1]: df = sc.sql.createDataFrame([], StructType([StructField("dt", TimestampType())]))

In [2]: df.filter(df.dt > datetime(2000, 01, 01, tzinfo=UTC)).explain()
Filter (dt#9 > 946702800000000)
 Scan PhysicalRDD[dt#9]

Note that 946702800000000 == Sat 1 Jan 2000 05:00:00 UTC

Attachments

Issue Links

links to

[Github] Pull Request #8536 (0x0FFF)

[Github] Pull Request #8555 (0x0FFF)

Activity

People

Assignee:: Alexey Grishchenko

Reporter:: Kevin Cox

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 21/Aug/15 21:55

Updated:: 02/Sep/15 12:32

Resolved:: 01/Sep/15 21:58