Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42754

Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.4.0
    • 3.4.1
    • SQL
    • None

    Description

      In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL executions when replaying event logs generated by older Spark versions.

       

      Reproduction:

      In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=eventlogs, run three non-nested SQL queries:

      sql("select * from range(10)").collect()
      sql("select * from range(20)").collect()
      sql("select * from range(30)").collect()

      Exit the shell and use the Spark History Server to replay this application's UI.

      In the SQL tab I expect to see three separate queries, but Spark 3.4's history server incorrectly groups the second and third queries as nested queries of the first (see attached screenshot).

       

      Root cause

      https://github.com/apache/spark/pull/39268 / SPARK-41752 added a new non-optional rootExecutionId: Long field to the SparkListenerSQLExecutionStart case class.

      When JsonProtocol deserializes this event it uses the "ignore missing properties" Jackson deserialization option, causing the rootExecutionField to be initialized with a default value of 0.

      The value 0 is a legitimate execution ID, so in the deserialized event we have no ability to distinguish between the absence of a value and a case where all queries have the first query as the root.

      Proposed fix:

      I think we should change this field to be of type Option[Long] . I believe this is a release blocker for Spark 3.4.0 because we cannot change the type of this new field in a future release without breaking binary compatibility.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            linhongliu-db Linhong Liu
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment