In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL executions when replaying event logs generated by older Spark versions.
In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=eventlogs, run three non-nested SQL queries:
Exit the shell and use the Spark History Server to replay this application's UI.
In the SQL tab I expect to see three separate queries, but Spark 3.4's history server incorrectly groups the second and third queries as nested queries of the first (see attached screenshot).
When JsonProtocol deserializes this event it uses the "ignore missing properties" Jackson deserialization option, causing the rootExecutionField to be initialized with a default value of 0.
The value 0 is a legitimate execution ID, so in the deserialized event we have no ability to distinguish between the absence of a value and a case where all queries have the first query as the root.
I think we should change this field to be of type Option[Long] . I believe this is a release blocker for Spark 3.4.0 because we cannot change the type of this new field in a future release without breaking binary compatibility.