Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32168

DSv2 SQL overwrite incorrectly uses static plan with hidden partitions



    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.1, 3.1.0
    • Component/s: SQL
    • Labels:
    • Target Version/s:


      The v2 analyzer rule ResolveInsertInto tries to detect when a static overwrite and a dynamic overwrite would produce the same result and will choose to use static overwrite in that case. It will only use a dynamic overwrite if there is a partition column without a static value and the SQL mode is set to dynamic.

      val dynamicPartitionOverwrite = partCols.size > staticPartitions.size &&
                conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC

      The problem is that partCols are the names of only partitions that are in the column list (identity partitions) and does not include hidden partitions, like days(ts). As a result, this doesn't detect hidden partitions and use dynamic overwrite. Static overwrite is used instead; when a table has only hidden partitions, the static filter drops all table data.

      This is a correctness bug because Spark will overwrite more data than just the set of partitions being written to in dynamic mode. The impact is limited because this rule is only used for SQL queries (not plans from DataFrameWriters) and only affects tables with hidden partitions.


          Issue Links



              • Assignee:
                rdblue Ryan Blue
                rdblue Ryan Blue
              • Votes:
                0 Vote for this issue
                5 Start watching this issue


                • Created: