Details
-
Umbrella
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.0.0
-
None
-
None
Description
This umbrella Jira ticket tracks implementing new support for issuing SQL queries using pipe syntax.
The objective is to make it easy to compose queries by specifying a sequence of SQL clauses separated by the pipe token |> wherein each operator represents a fully-defined transformation of the preceding relation. Each pipe operator may refer to the names and rows generated by the preceding pipe operator only; otherwise, each step is stateless.
- Research paper: https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/
- Open-source ZetaSQL implementation: https://github.com/google/zetasql/blob/master/docs/pipe-syntax.md
- Spark prototype: https://github.com/apache/spark/pull/47837
For example, here's query 13 from TPC-H:
SELECT c_count, COUNT( * ) AS custdist FROM
(SELECT c_custkey, COUNT(o_orderkey) c_count FROM customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%' GROUP BY c_custkey) AS c_orders
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;
With the new syntax, it becomes:
FROM customer
|> LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%'
|> AGGREGATE COUNT(o_orderkey) c_count
GROUP BY c_custkey
|> AGGREGATE COUNT( * ) AS custdist
GROUP BY c_count
|> ORDER BY custdist DESC, c_count DESC;
Attachments
1.
|
FROM operator | Open | Unassigned | |
2.
|
EXTEND + SET + DROP operators | Open | Unassigned | |
3.
|
Add .sql file testing to check equality of SQL pipe queries and many regular SQL queries | Open | Unassigned | |
4.
|
Add documentation for SQL pipe syntax | Open | Unassigned |