Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-11944

Support FirstRow and LastRow for SQL

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • 1.9.0
    • Table SQL / Runtime
    • None

    Description

      Usually there are some duplicate data in the source due to some reasons. In order to get a correct result, we need to do deduplication. FirstRow and LastRow are two different strategy for deduplication. The syntax of FirstRow and LastRow is similar to TopN, but order by a time attribute. For example:

      SELECT *
      FROM (
      SELECT *,
      ROW_NUMBER() OVER (PARTITION BY id ORDER BY proctime DESC) as rownum
      FROM T
      )
      WHERE rownum = 1

      Some information about FirstRow & LastRow.
      1. the partition by key is the deduplicate key
      2. can only order by a time attribute (either proctime or rowtime)
      3. the rownum filter must be = 1 or <= 1
      4. it is FirstRow when order direction is ASC, LastRow when order direction is DESC

      This issue is aiming to optimize this query to a FirstLastRow node instead of Over plus Calc. And translate the it into physical operators.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jark Jark Wu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: