Description
Mentor email: ruwang@google.com. Feel free to send emails for your questions.
Project Information
---------------------
BeamSQL has a long list of of aggregation/aggregation analytics functionalities to support.
To begin with, you will need to support this syntax:
analytic_function_name ( [ argument_list ] ) OVER ( [ PARTITION BY partition_expression_list ] [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ window_frame_clause ] )
As there is a long list of analytics functions, a good start point is support rank() first.
This will requires touch core components of BeamSQL:
1. SQL parser to support the syntax above.
2. SQL core to implement physical relational operator.
3. Distributed algorithms to implement a list of functions in a distributed manner.
4. Enable in ZetaSQL dialect.
To understand what SQL analytics functionality is, you could check this great explanation doc: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.
To know about Beam's programming model, check: https://beam.apache.org/documentation/programming-guide/#overview