Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34292

NOW is interpreted as the NOW SQL function

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.0.0
    • None
    • PySpark, Spark Core
    • None

    Description

      I think we ran into a bug in the Spark framework. Basically, the bug we caught is like this: when reading a data frame in Parquet format partitioned by a column, if the column contains values of “NOW”, NOW will be interpreted as the NOW function as in SQL, and returns the literal timestamp of NOW.

       

      Steps to reproduce:

      from pyspark.sql.session import SparkSession

      spark = SparkSession.builder.getOrCreate()

      df = spark.createDataFrame([['NOW', 1], ['THEN', 2]], schema=['Col1', 'Col2'])

      df.write.parquet('/tmp/my_partitioned_data', mode='overwrite', partitionBy=['Col1'])

      df_read_back = spark.read.parquet('/tmp/my_partitioned_data')

      """
      In [1]: df.show()
      ------+

      Col1 Col2

      ------+

      NOW 1
      THEN 2

      ------+

      In [2]: df_read_back.show()
      ----------------------+

      Col2 Col1

      ----------------------+

      1 2021-01-22 10:46:...
      2 THEN

      ----------------------+

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gmines Gaelan Mines
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: