[SPARK-37690] Recursive view `df` detected (cycle: `df` -> `df`) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 3.2.0
Fix Version/s: 3.3.0, 3.2.2
Component/s: PySpark
Labels:
None

Description

In Spark 3.2.0, you can no longer reuse the same name for a temporary view. This change is backwards incompatible, and means a common way of running pipelines of SQL queries no longer works. The following is a simple reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0:

from pyspark.context import SparkContext 
from pyspark.sql import SparkSession 
sc = SparkContext.getOrCreate() 
spark = SparkSession(sc) 
sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ 

df = spark.sql(sql) 

df.createOrReplaceTempView("df") 

sql = """ SELECT * FROM df """ 

df = spark.sql(sql) 

df.createOrReplaceTempView("df") 

sql = """ SELECT * FROM df """ 

df = spark.sql(sql)

The following error is now produced:

AnalysisException: Recursive view `df` detected (cycle: `df` -> `df`)

I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a lot of legacy code, and the `createOrReplaceTempView` method is named explicitly such that replacing an existing view should be allowed. An internet search suggests other users have run into a similar problems, e.g. here

Attachments

Issue Links

duplicates

SPARK-38318 regression when replacing a dataset view

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Robin

Votes:: 4 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 20/Dec/21 07:49

Updated:: 24/May/23 18:19

Resolved:: 24/May/23 18:19