[SPARK-34292] NOW is interpreted as the NOW SQL function - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: PySpark, Spark Core
Labels:
None

Description

I think we ran into a bug in the Spark framework. Basically, the bug we caught is like this: when reading a data frame in Parquet format partitioned by a column, if the column contains values of “NOW”, NOW will be interpreted as the NOW function as in SQL, and returns the literal timestamp of NOW.

Steps to reproduce:

from pyspark.sql.session import SparkSession

spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([['NOW', 1], ['THEN', 2]], schema=['Col1', 'Col2'])

df.write.parquet('/tmp/my_partitioned_data', mode='overwrite', partitionBy=['Col1'])

df_read_back = spark.read.parquet('/tmp/my_partitioned_data')

"""
In [1]: df.show()
------+

Col1

Col2

------+

NOW	1
THEN	2

------+

In [2]: df_read_back.show()
----------------------+

Col2

Col1

----------------------+

1	2021-01-22 10:46:...
2	THEN

----------------------+

Attachments

Issue Links

duplicates

SPARK-34259 Reading a partitioned dataset with a partition value of NOW causes the value to be parsed as a timestamp.

In Progress

Activity

People

Assignee:: Unassigned

Reporter:: Gaelan Mines

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 29/Jan/21 15:56

Updated:: 12/Dec/22 18:10

Resolved:: 31/Jan/21 04:55