Details
-
Umbrella
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.0.0
-
None
Description
This umbrella Jira ticket tracks implementing new support for issuing SQL queries using pipe syntax.
The objective is to make it easy to compose queries by specifying a sequence of SQL clauses separated by the pipe token |> wherein each operator represents a fully-defined transformation of the preceding relation. Each pipe operator may refer to the names and rows generated by the preceding pipe operator only; otherwise, each step is stateless.
- Research paper: https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/
- Open-source ZetaSQL implementation: https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md
- Spark prototype: https://github.com/apache/spark/pull/47837
For example, here's query 13 from TPC-H:
SELECT c_count, COUNT( * ) AS custdist FROM
(SELECT c_custkey, COUNT(o_orderkey) c_count FROM customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%' GROUP BY c_custkey) AS c_orders
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;
With the new syntax, it becomes:
FROM customer
|> LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%'
|> AGGREGATE COUNT(o_orderkey) c_count
GROUP BY c_custkey
|> AGGREGATE COUNT( * ) AS custdist
GROUP BY c_count
|> ORDER BY custdist DESC, c_count DESC;
Attachments
Issue Links
- is related to
-
SPARK-44111 Prepare Apache Spark 4.0.0
- Open