[SPARK-42754] Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.4.1
Component/s: SQL
Labels:
None

Target Version/s:

3.4.0

Description

In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL executions when replaying event logs generated by older Spark versions.

Reproduction:

In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=eventlogs, run three non-nested SQL queries:

sql("select * from range(10)").collect()
sql("select * from range(20)").collect()
sql("select * from range(30)").collect()

Exit the shell and use the Spark History Server to replay this application's UI.

In the SQL tab I expect to see three separate queries, but Spark 3.4's history server incorrectly groups the second and third queries as nested queries of the first (see attached screenshot).

Root cause:

https://github.com/apache/spark/pull/39268 / ~~SPARK-41752~~ added a new non-optional rootExecutionId: Long field to the SparkListenerSQLExecutionStart case class.

When JsonProtocol deserializes this event it uses the "ignore missing properties" Jackson deserialization option, causing the rootExecutionField to be initialized with a default value of 0.

The value 0 is a legitimate execution ID, so in the deserialized event we have no ability to distinguish between the absence of a value and a case where all queries have the first query as the root.

Proposed fix:

I think we should change this field to be of type Option[Long] . I believe this is a release blocker for Spark 3.4.0 because we cannot change the type of this new field in a future release without breaking binary compatibility.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

example.png
11/Mar/23 03:04
65 kB
Josh Rosen

Issue Links

links to

[Github] Pull Request #40403 (linhongliu-db)

Activity

People

Assignee:: Linhong Liu

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Mar/23 03:04

Updated:: 14/Mar/23 16:31

Resolved:: 14/Mar/23 16:07