Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-22894

Window Top-N should allow n=1

    XMLWordPrintableJSON

Details

    Description

      I tried to reimplement the Hourly Tips exercise from the DataStream training using Flink SQL. The objective of this exercise is to find the one taxi driver who earned the most in tips during each hour, and report that driver's driverId and the sum of their tips. 

      This can be expressed as a window top-n query, where n=1, as in

      FROM (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY sumOfTips DESC) as rownum
        FROM ( 
          SELECT driverId, window_start, window_end, sum(tip) as sumOfTips
          FROM TABLE( 
            TUMBLE(TABLE fares, DESCRIPTOR(startTime), INTERVAL '1' HOUR))
          GROUP BY driverId, window_start, window_end
        )
      ) WHERE rownum = 1;

       

      This fails because the WindowRankOperatorBuilder insists on {{rankEnd > 1. }}So, in other words, while it is possible to report the top 2 drivers, or the driver in 2nd place, it's not possible to report only the top driver.

      This appears to be an off-by-one error in the range checking.

       

       

      Attachments

        Issue Links

          Activity

            People

              jingzhang Jing Zhang
              alpinegizmo David Anderson
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: