[SPARK-49555] SQL Pipe Syntax - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.0.0
Fix Version/s: None
Component/s: SQL
Labels:
- releasenotes

Description

This umbrella Jira ticket tracks implementing new support for issuing SQL queries using pipe syntax.

The objective is to make it easy to compose queries by specifying a sequence of SQL clauses separated by the pipe token |> wherein each operator represents a fully-defined transformation of the preceding relation. Each pipe operator may refer to the names and rows generated by the preceding pipe operator only; otherwise, each step is stateless.

Research paper: https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/
Open-source ZetaSQL implementation: https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md
Spark prototype: https://github.com/apache/spark/pull/47837

For example, here's query 13 from TPC-H:

SELECT c_count, COUNT( * ) AS custdist FROM
(SELECT c_custkey, COUNT(o_orderkey) c_count FROM customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%' GROUP BY c_custkey) AS c_orders
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;

With the new syntax, it becomes:

FROM customer
|> LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%'
|> AGGREGATE COUNT(o_orderkey) c_count
GROUP BY c_custkey
|> AGGREGATE COUNT( * ) AS custdist
GROUP BY c_count
|> ORDER BY custdist DESC, c_count DESC;

Attachments

Issue Links

is related to

SPARK-44111 Prepare Apache Spark 4.0.0

Open

Sub-Tasks

Enable SQL pipe syntax by default

Open

Daniel

Activity

People

Assignee:: Daniel

Reporter:: Daniel

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Sep/24 20:36

Updated:: Yesterday 04:07