Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-12936

Support intersect all and minus all to blink planner

    XMLWordPrintableJSON

Details

    Description

      Now, we just support intersect and minus, See ReplaceIntersectWithSemiJoinRule and ReplaceMinusWithAntiJoinRule, replace intersect with null aware semi-join and distinct aggregate.

      We need support intersect all and minus all too.

      Presto and Spark already support them:

      https://github.com/prestodb/presto/issues/4918

      https://issues.apache.org/jira/browse/SPARK-21274

      I think them have a good rewrite design and we can follow them:

      1.For intersect all

      Input Query

      SELECT c1 FROM ut1 INTERSECT ALL SELECT c1 FROM ut2
      

      Rewritten Query

        SELECT c1
          FROM (
               SELECT replicate_row(min_count, c1)
               FROM (
                    SELECT c1,
                           IF (vcol1_cnt > vcol2_cnt, vcol2_cnt, vcol1_cnt) AS min_count
                    FROM (
                         SELECT   c1, count(vcol1) as vcol1_cnt, count(vcol2) as vcol2_cnt
                         FROM (
                              SELECT c1, true as vcol1, null as vcol2 FROM ut1
                              UNION ALL
                              SELECT c1, null as vcol1, true as vcol2 FROM ut2
                              ) AS union_all
                         GROUP BY c1
                         HAVING vcol1_cnt >= 1 AND vcol2_cnt >= 1
                        )
                    )
                )
      

      2.For minus all:

      Input Query

      SELECT c1 FROM ut1 EXCEPT ALL SELECT c1 FROM ut2
      

      Rewritten Query

       SELECT c1
          FROM (
           SELECT replicate_rows(sum_val, c1)
             FROM (
               SELECT c1, sum_val
                 FROM (
                   SELECT c1, sum(vcol) AS sum_val
                     FROM (
                       SELECT 1L as vcol, c1 FROM ut1
                       UNION ALL
                       SELECT -1L as vcol, c1 FROM ut2
                    ) AS union_all
                  GROUP BY union_all.c1
                )
              WHERE sum_val > 0
             )
         )
      

      Attachments

        Issue Links

          Activity

            People

              lzljs3620320 Jingsong Lee
              lzljs3620320 Jingsong Lee
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m