Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32168

DSv2 SQL overwrite incorrectly uses static plan with hidden partitions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.0.0
    • 3.0.1, 3.1.0
    • SQL

    Description

      The v2 analyzer rule ResolveInsertInto tries to detect when a static overwrite and a dynamic overwrite would produce the same result and will choose to use static overwrite in that case. It will only use a dynamic overwrite if there is a partition column without a static value and the SQL mode is set to dynamic.

      val dynamicPartitionOverwrite = partCols.size > staticPartitions.size &&
                conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC
      

      The problem is that partCols are the names of only partitions that are in the column list (identity partitions) and does not include hidden partitions, like days(ts). As a result, this doesn't detect hidden partitions and use dynamic overwrite. Static overwrite is used instead; when a table has only hidden partitions, the static filter drops all table data.

      This is a correctness bug because Spark will overwrite more data than just the set of partitions being written to in dynamic mode. The impact is limited because this rule is only used for SQL queries (not plans from DataFrameWriters) and only affects tables with hidden partitions.

      Attachments

        Issue Links

          Activity

            People

              rdblue Ryan Blue
              rdblue Ryan Blue
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: