[SPARK-50303] Enable QUERY_TAG for SQL Session in Spark SQL - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Wish
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 4.0.0, 3.5.3
Fix Version/s: None
Component/s: SQL
Labels:

Description

As Spark SQL becomes more powerful for both analytics and ELT (with big T), we see more tools are generating and executing SQL to transform data.

Session is a very important mechanism for lineage and usage/cost tracking, especially for the multi-statement ELT cases. Tagging a series of query statements with the higher level business context (such as project, flow_name, job_name, batch_id, start_data_dt, end_data_dt, owner, cost_group, ...) can provide tremendous observability improvement without much overhead. It is not efficient to collect and analyze the scattered query UUID and try to group them together to reconstruct the SESSION. But it is quite easy to allow the SQL client to set the tags when the session is established.

Presto has Session Properties
Trino has X-Trino-Session, X-Trino-Client-Info and X-Trino-Client-Tags to carry a list of K/V
Snowflake has QUERY_TAG to make observability much easier and efficient
Redshift supports tagging for query as well

It will be great that Spark SQL can set a paved path/recipe for the workload/cost analysis/observability based on the session QUERY_TAG, so that the whole community can follow instead reinventing the wheel.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Eric Sun

Shepherd:: Shant Hovsepian

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 4 days ago 20:18

Updated:: 4 days ago 20:27