Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37690

Recursive view `df` detected (cycle: `df` -> `df`)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.2.0
    • 3.3.0, 3.2.2
    • PySpark
    • None

    Description

      In Spark 3.2.0, you can no longer reuse the same name for a temporary view. This change is backwards incompatible, and means a common way of running pipelines of SQL queries no longer works. The following is a simple reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0:

      from pyspark.context import SparkContext 
      from pyspark.sql import SparkSession 
      sc = SparkContext.getOrCreate() 
      spark = SparkSession(sc) 
      sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ 
      
      df = spark.sql(sql) 
      
      df.createOrReplaceTempView("df") 
      
      sql = """ SELECT * FROM df """ 
      
      df = spark.sql(sql) 
      
      df.createOrReplaceTempView("df") 
      
      sql = """ SELECT * FROM df """ 
      
      df = spark.sql(sql) 

      The following error is now produced:

      AnalysisException: Recursive view `df` detected (cycle: `df` -> `df`) 
      

      I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a lot of legacy code, and the `createOrReplaceTempView` method is named explicitly such that replacing an existing view should be allowed. An internet search suggests other users have run into a similar problems, e.g. here

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              RobinLinacre Robin
              Votes:
              4 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: