Description
In Spark 3.2.0, you can no longer reuse the same name for a temporary view. This change is backwards incompatible, and means a common way of running pipelines of SQL queries no longer works. The following is a simple reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0:
from pyspark.context import SparkContext from pyspark.sql import SparkSession sc = SparkContext.getOrCreate() spark = SparkSession(sc) sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ df = spark.sql(sql) df.createOrReplaceTempView("df") sql = """ SELECT * FROM df """ df = spark.sql(sql) df.createOrReplaceTempView("df") sql = """ SELECT * FROM df """ df = spark.sql(sql)
The following error is now produced:
AnalysisException: Recursive view `df` detected (cycle: `df` -> `df`)
I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a lot of legacy code, and the `createOrReplaceTempView` method is named explicitly such that replacing an existing view should be allowed. An internet search suggests other users have run into a similar problems, e.g. here
Attachments
Issue Links
- duplicates
-
SPARK-38318 regression when replacing a dataset view
- Resolved