Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49555

SQL Pipe Syntax

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • SQL
    • None

    Description

      This umbrella Jira ticket tracks implementing new support for issuing SQL queries using pipe syntax.

      The objective is to make it easy to compose queries by specifying a sequence of SQL clauses separated by the pipe token |> wherein each operator represents a fully-defined transformation of the preceding relation. Each pipe operator may refer to the names and rows generated by the preceding pipe operator only; otherwise, each step is stateless.

       
      For example, here's query 13 from TPC-H:
       
      SELECT c_count, COUNT( * ) AS custdist FROM
        (SELECT c_custkey, COUNT(o_orderkey) c_count FROM customer
        LEFT OUTER JOIN orders ON c_custkey = o_custkey
        AND o_comment NOT LIKE '%unusual%packages%' GROUP BY c_custkey) AS c_orders
      GROUP BY c_count
      ORDER BY custdist DESC, c_count DESC;
       
      With the new syntax, it becomes:
       
      FROM customer
       |> LEFT OUTER JOIN orders ON c_custkey = o_custkey
          AND o_comment NOT LIKE '%unusual%packages%'
       |> AGGREGATE COUNT(o_orderkey) c_count
          GROUP BY c_custkey
       |> AGGREGATE COUNT( * ) AS custdist
          GROUP BY c_count
       |> ORDER BY custdist DESC, c_count DESC;

      Attachments

        Activity

          People

            Unassigned Unassigned
            dtenedor Daniel
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: