Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.2.0
-
None
Description
Filter behavior seems like it's ignoring time in the ISO-8601 string. See below for code to reproduce:
import datetime from pyspark.sql import SparkSession from pyspark.sql.types import StructType, StructField, TimestampType spark = SparkSession.builder.getOrCreate() data = [{"dates": datetime.datetime(2017, 1, 1, 12)}] schema = StructType([StructField("dates", TimestampType())]) df = spark.createDataFrame(data, schema=schema) # df.head() returns (correctly): # Row(dates=datetime.datetime(2017, 1, 1, 12, 0)) df.filter(df["dates"] > datetime.datetime(2017, 1, 1, 11).isoformat()).count() # should return 1, instead returns 0 # datetime.datetime(2017, 1, 1, 11).isoformat() returns '2017-01-01T11:00:00' df.filter(df["dates"] > datetime.datetime(2016, 12, 31, 11).isoformat()).count() # this one works
Of course, the simple work around is to use the datetime objects themselves in the query expression, but in practice, this means using dateutil to parse some data, which is not ideal.
Attachments
Issue Links
- relates to
-
SPARK-22108 Logical Inconsistency in Timestamp Cast
- Resolved