[FLINK-11943] Support TopN feature for SQL - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: 1.9.0
Component/s: Table SQL / Runtime
Labels:
None

Description

TopN is a frequently used feature in data analysis. We can use ORDER BY + LIMIT to easily express a TopN query, e.g. SELECT * FROM T ORDER BY amount DESC LIMIT 10.

But this is a global TopN, there is a great requirement for per-group TopN. For example, top 10 shops for each category. In order to avoid introducing new syntax for this, we would like to use traditional syntax to express it by using ROW_NUMBER over window + FILTER to limit the numbers.

For example:

SELECT *
FROM (
SELECT category, shopId, sales,
[ROW_NUMBER()|RANK()|DENSE_RANK()] OVER
(PARTITION BY category ORDER BY sales ASC) as rownum
FROM shop_sales
)
WHERE rownum <= 10

This issue is aiming to optimize this query to an Rank node instead of Over plus Calc. And translate the Rank node into physical operators.

There are some optimization for rank operator based on the different input of the Rank. We would like to implement the basic and one-fit-all implementation. And do the performance improvement later.

Here is a brief design doc: https://docs.google.com/document/d/14JCV6X6hcpoA51loprgntZNxQ2NmnDLucxgGY8xVDuI/edit#

Attachments

Issue Links

is a child of

FLINK-11489 Add an initial Blink SQL streaming runtime

Closed

is duplicated by

FLINK-12017 Introduce Rank and Deduplicate operators for blink streaming runtime

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Jark Wu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Mar/19 02:26

Updated:: 20/May/19 01:26

Resolved:: 20/May/19 01:26