Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28411

insertInto with overwrite inconsistent behaviour Python/Scala

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2.1, 2.4.0
    • Fix Version/s: 3.0.0
    • Component/s: PySpark, SQL
    • Labels:

      Description

      The df.write.mode("overwrite").insertInto("table") has inconsistent behaviour between Scala and Python. In Python, insertInto ignores "mode" parameter and appends by default. Only when changing syntax to df.write.insertInto("table", overwrite=True) we get expected behaviour.

      This is a native Spark syntax, expected to be the same between languages... Also, in other write methods, like saveAsTable or write.parquet "mode" seem to be respected.

      Reproduce, Python, ignore "overwrite":

      df = spark.createDataFrame(sc.parallelize([(1, 2),(3,4)]),['i','j'])
      
      # create the table and load data
      df.write.saveAsTable("spark_overwrite_issue")
      
      # insert overwrite, expected result - 2 rows
      df.write.mode("overwrite").insertInto("spark_overwrite_issue")
      
      spark.sql("select * from spark_overwrite_issue").count()
      # result - 4 rows, insert appended data instead of overwrite

      Reproduce, Scala, works as expected:

      val df = Seq((1, 2),(3,4)).toDF("i","j")
      
      df.write.mode("overwrite").insertInto("spark_overwrite_issue")
      
      spark.sql("select * from spark_overwrite_issue").count()
      # result - 2 rows

      Tested on Spark 2.2.1 (EMR) and 2.4.0 (Databricks)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                huaxingao Huaxin Gao
                Reporter:
                vapira Maria Rebelka
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: