[SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter` - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.3
Fix Version/s: 3.3.4
Component/s: SQL
Labels:
None

Description

In this jira, the case where `outputOrdering` might not work if AQE is enabled has been resolved.

https://issues.apache.org/jira/browse/SPARK-40588

However, since it materializes the AQE plan in advance (triggers getFinalPhysicalPlan) , it may cause the committer.setupJob(job) to not execute When `AdaptiveSparkPlanExec#getFinalPhysicalPlan()` is executed with an error.

Normally this step should be executed after committer.setupJob(job).

This may eventually result in the insertoverwrite directory being deleted.

import org.apache.hadoop.fs.{FileSystem, Path}

import org.apache.spark.sql.QueryTest
import org.apache.spark.sql.catalyst.TableIdentifier

sql("CREATE TABLE IF NOT EXISTS spark32_overwrite(amt1 int) STORED AS ORC")
sql("CREATE TABLE IF NOT EXISTS spark32_overwrite2(amt1 long) STORED AS ORC")
sql("INSERT OVERWRITE TABLE spark32_overwrite2 select 6000044164")
sql("set spark.sql.ansi.enabled=true")

val loc =
  spark.sessionState.catalog.getTableMetadata(TableIdentifier("spark32_overwrite")).location

val fs = FileSystem.get(loc, spark.sparkContext.hadoopConfiguration)
println("Location exists: " + fs.exists(new Path(loc)))

try {
  sql("INSERT OVERWRITE TABLE spark32_overwrite select amt1 from " +
    "(select cast(amt1 as int) as amt1 from spark32_overwrite2 distribute by amt1)")
} finally {
  println("Location exists: " + fs.exists(new Path(loc)))
}

Attachments

Activity

People

Assignee:: zzzzming95

Reporter:: zzzzming95

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Apr/23 07:11

Updated:: 22/Aug/23 03:08

Resolved:: 22/Aug/23 03:08