Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32168

DSv2 SQL overwrite incorrectly uses static plan with hidden partitions

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.0.0
    • 3.0.1, 3.1.0
    • SQL

    Description

      The v2 analyzer rule ResolveInsertInto tries to detect when a static overwrite and a dynamic overwrite would produce the same result and will choose to use static overwrite in that case. It will only use a dynamic overwrite if there is a partition column without a static value and the SQL mode is set to dynamic.

      val dynamicPartitionOverwrite = partCols.size > staticPartitions.size &&
                conf.partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC
      

      The problem is that partCols are the names of only partitions that are in the column list (identity partitions) and does not include hidden partitions, like days(ts). As a result, this doesn't detect hidden partitions and use dynamic overwrite. Static overwrite is used instead; when a table has only hidden partitions, the static filter drops all table data.

      This is a correctness bug because Spark will overwrite more data than just the set of partitions being written to in dynamic mode. The impact is limited because this rule is only used for SQL queries (not plans from DataFrameWriters) and only affects tables with hidden partitions.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rdblue Ryan Blue
            rdblue Ryan Blue
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment