[SPARK-49555] SQL Pipe Syntax - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.0.0
Fix Version/s: None
Component/s: SQL
Labels:
- releasenotes

Description

This umbrella Jira ticket tracks implementing new support for issuing SQL queries using pipe syntax.

The objective is to make it easy to compose queries by specifying a sequence of SQL clauses separated by the pipe token |> wherein each operator represents a fully-defined transformation of the preceding relation. Each pipe operator may refer to the names and rows generated by the preceding pipe operator only; otherwise, each step is stateless.

Research paper: https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/
Open-source ZetaSQL implementation: https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md
Spark prototype: https://github.com/apache/spark/pull/47837

For example, here's query 13 from TPC-H:

SELECT c_count, COUNT( * ) AS custdist FROM
(SELECT c_custkey, COUNT(o_orderkey) c_count FROM customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%' GROUP BY c_custkey) AS c_orders
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;

With the new syntax, it becomes:

FROM customer
|> LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%'
|> AGGREGATE COUNT(o_orderkey) c_count
GROUP BY c_custkey
|> AGGREGATE COUNT( * ) AS custdist
GROUP BY c_count
|> ORDER BY custdist DESC, c_count DESC;

Attachments

Issue Links

is related to

SPARK-44111 Prepare Apache Spark 4.0.0

Open

Sub-Tasks

1.	SELECT operator	Resolved	Daniel
2.	WHERE operator	Resolved	Daniel
3.	ORDER BY + LIMIT + OFFSET operators	Resolved	Daniel
4.	Set operations	Resolved	Daniel
5.	TABLESAMPLE operator	Resolved	Daniel
6.	PIVOT + UNPIVOT operators	Resolved	Daniel
7.	AGGREGATE operator	Resolved	Daniel
8.	WINDOW operator	Resolved	Daniel
9.	JOIN operator	Resolved	Daniel
10.	EXTEND operator	Resolved	Daniel
11.	GROUP BY ALL support	Closed	Unassigned
12.	Add documentation for SQL pipe syntax	Resolved	Daniel
13.	SET operator	Resolved	Unassigned
14.	DROP operator	Resolved	Daniel
15.	AS operator	Resolved	Daniel
16.	Add .sql file testing to check equality of SQL pipe queries and many regular SQL queries	Closed	Daniel
17.	FROM operator	Resolved	Jiashen Cao
18.	Fix `CREATE TABLE` syntax in `sql-pipe-syntax.md`	Resolved	Dongjoon Hyun
19.	Enable SQL pipe syntax by default	Open	Daniel

Activity

People

Assignee:: Daniel

Reporter:: Daniel

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Sep/24 20:36

Updated:: Yesterday 04:07