Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22070

Spark SQL filter comparisons failing with timestamps and ISO-8601 strings

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.2.0
    • None
    • PySpark

    Description

      Filter behavior seems like it's ignoring time in the ISO-8601 string. See below for code to reproduce:

      import datetime
      
      from pyspark.sql import SparkSession
      from pyspark.sql.types import StructType, StructField, TimestampType
      
      spark = SparkSession.builder.getOrCreate()
      
      data = [{"dates": datetime.datetime(2017, 1, 1, 12)}]
      schema = StructType([StructField("dates", TimestampType())])
      df = spark.createDataFrame(data, schema=schema)
      # df.head() returns (correctly):
      #   Row(dates=datetime.datetime(2017, 1, 1, 12, 0))
      
      df.filter(df["dates"] > datetime.datetime(2017, 1, 1, 11).isoformat()).count()
      # should return 1, instead returns 0
      # datetime.datetime(2017, 1, 1, 11).isoformat() returns '2017-01-01T11:00:00'
      df.filter(df["dates"] > datetime.datetime(2016, 12, 31, 11).isoformat()).count()
      # this one works
      

      Of course, the simple work around is to use the datetime objects themselves in the query expression, but in practice, this means using dateutil to parse some data, which is not ideal.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              superdosh Vishal Doshi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: