Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1600

Invalid query planning for distinct group-by

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: Planner/Optimizer
    • Labels:
      None

      Description

      For a query involving distinct operator, group-by is always executed at the last step of the query. Let me consider an example query as follows.

      default> select distinct a.col3 from test as a left outer join lineitem b on a.col1 = b.l_orderkey order by a.col3;
      

      The plan for this query is

      GROUP_BY(5)(col3)
        => target list: default.a.col3 (TEXT)
        => out schema:{(1) default.a.col3 (TEXT)}
        => in schema:{(1) default.a.col3 (TEXT)}
         SORT(3)
           => Sort Keys: default.a.col3 (TEXT) (asc)
            JOIN(7)(LEFT_OUTER)
              => Join Cond: default.a.col1 (INT4) = default.b.l_orderkey (INT4)
              => target list: default.a.col3 (TEXT)
              => out schema: {(1) default.a.col3 (TEXT)}
              => in schema: {(3) default.a.col3 (TEXT), default.a.col1 (INT4), default.b.l_orderkey (INT4)}
               SCAN(1) on default.lineitem_large as b
                 => target list: default.b.l_orderkey (INT4)
                 => out schema: {(1) default.b.l_orderkey (INT4)}
                 => in schema: {(16) default.b.l_orderkey (INT4), default.b.l_partkey (INT4), default.b.l_suppkey (INT4), default.b.l_linenumber (INT4), default.b.l_quantity (FLOAT8), default.b.l_extendedprice (FLOAT8), default.b.l_discount (FLOAT8), default.b.l_tax (FLOAT8), default.b.l_returnflag (TEXT), default.b.l_linestatus (TEXT), default.b.l_shipdate (TEXT), default.b.l_commitdate (TEXT), default.b.l_receiptdate (TEXT), default.b.l_shipinstruct (TEXT), default.b.l_shipmode (TEXT), default.b.l_comment (TEXT)}
               PARTITIONS_SCAN(8) on default.testbroadcastmulticolumnpartitiontable as a
                 => target list: default.a.col3 (TEXT), default.a.col1 (INT4)
                 => num of filtered paths: 3
                 => out schema: {(2) default.a.col3 (TEXT), default.a.col1 (INT4)}
                 => in schema: {(2) default.a.col1 (INT4), default.a.col2 (FLOAT4)}
                 => 0: hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=01/col4=1996
                 => 1: hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=10/col4=1993
                 => 2: hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=12/col4=1996
      

        Activity

        Hide
        atris Atri Sharma added a comment -

        Do we implement DISTINCT as Sort + Group BY + Filter? It should be simple to add a Filter node on top of current plan in that case.

        Show
        atris Atri Sharma added a comment - Do we implement DISTINCT as Sort + Group BY + Filter? It should be simple to add a Filter node on top of current plan in that case.
        Hide
        hyunsik Hyunsik Choi added a comment -

        how do you plan about this issue? Will you fix this issue in 0.11.0 release?

        Show
        hyunsik Hyunsik Choi added a comment - how do you plan about this issue? Will you fix this issue in 0.11.0 release?
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user hyunsik opened a pull request:

        https://github.com/apache/tajo/pull/750

        TAJO-1600: Invalid query planning for distinct group-by.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/hyunsik/tajo TAJO-1600

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/750.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #750


        commit a778e4a31d295c357af0454c8ef40b7cc7ecb3fe
        Author: Hyunsik Choi <hyunsik@apache.org>
        Date: 2015-09-09T08:43:04Z

        TAJO-1600: Invalid query planning for distinct group-by.


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user hyunsik opened a pull request: https://github.com/apache/tajo/pull/750 TAJO-1600 : Invalid query planning for distinct group-by. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hyunsik/tajo TAJO-1600 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/750.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #750 commit a778e4a31d295c357af0454c8ef40b7cc7ecb3fe Author: Hyunsik Choi <hyunsik@apache.org> Date: 2015-09-09T08:43:04Z TAJO-1600 : Invalid query planning for distinct group-by.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/750#issuecomment-139130051

        +1 ship it!

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/750#issuecomment-139130051 +1 ship it!
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/750

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/750
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #497 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/497/)
        TAJO-1600: Invalid query planning for distinct group-by. (hyunsik: rev fe99e0fff2fcdd0d0341d2f3d22b5d03fa4bea56)

        • tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/ProjectionPushDownRule.java
        • tajo-core-tests/src/test/resources/queries/TestCaseByCases/testTAJO_1600.sql
        • tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.result
        • tajo-plan/src/main/java/org/apache/tajo/plan/logical/GroupbyNode.java
        • tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.plan
        • CHANGES
        • tajo-plan/src/main/java/org/apache/tajo/plan/LogicalPlanner.java
        • tajo-core-tests/src/test/java/org/apache/tajo/engine/query/TestCaseByCases.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #497 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/497/ ) TAJO-1600 : Invalid query planning for distinct group-by. (hyunsik: rev fe99e0fff2fcdd0d0341d2f3d22b5d03fa4bea56) tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/ProjectionPushDownRule.java tajo-core-tests/src/test/resources/queries/TestCaseByCases/testTAJO_1600.sql tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.result tajo-plan/src/main/java/org/apache/tajo/plan/logical/GroupbyNode.java tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.plan CHANGES tajo-plan/src/main/java/org/apache/tajo/plan/LogicalPlanner.java tajo-core-tests/src/test/java/org/apache/tajo/engine/query/TestCaseByCases.java
        Hide
        hyunsik Hyunsik Choi added a comment -

        committed.

        Show
        hyunsik Hyunsik Choi added a comment - committed.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #855 (See https://builds.apache.org/job/Tajo-master-build/855/)
        TAJO-1600: Invalid query planning for distinct group-by. (hyunsik: rev fe99e0fff2fcdd0d0341d2f3d22b5d03fa4bea56)

        • tajo-plan/src/main/java/org/apache/tajo/plan/logical/GroupbyNode.java
        • tajo-plan/src/main/java/org/apache/tajo/plan/LogicalPlanner.java
        • tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.plan
        • CHANGES
        • tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/ProjectionPushDownRule.java
        • tajo-core-tests/src/test/resources/queries/TestCaseByCases/testTAJO_1600.sql
        • tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.result
        • tajo-core-tests/src/test/java/org/apache/tajo/engine/query/TestCaseByCases.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #855 (See https://builds.apache.org/job/Tajo-master-build/855/ ) TAJO-1600 : Invalid query planning for distinct group-by. (hyunsik: rev fe99e0fff2fcdd0d0341d2f3d22b5d03fa4bea56) tajo-plan/src/main/java/org/apache/tajo/plan/logical/GroupbyNode.java tajo-plan/src/main/java/org/apache/tajo/plan/LogicalPlanner.java tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.plan CHANGES tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/ProjectionPushDownRule.java tajo-core-tests/src/test/resources/queries/TestCaseByCases/testTAJO_1600.sql tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.result tajo-core-tests/src/test/java/org/apache/tajo/engine/query/TestCaseByCases.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-0.11.0-build #32 (See https://builds.apache.org/job/Tajo-0.11.0-build/32/)
        TAJO-1600: Invalid query planning for distinct group-by. (hyunsik: rev 7a1a8ce3c9648b67027b3714ad0eba3641d6ecc2)

        • tajo-core-tests/src/test/resources/queries/TestCaseByCases/testTAJO_1600.sql
        • tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/ProjectionPushDownRule.java
        • tajo-plan/src/main/java/org/apache/tajo/plan/logical/GroupbyNode.java
        • tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.result
        • tajo-core-tests/src/test/java/org/apache/tajo/engine/query/TestCaseByCases.java
        • tajo-plan/src/main/java/org/apache/tajo/plan/LogicalPlanner.java
        • tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.plan
        • CHANGES
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-0.11.0-build #32 (See https://builds.apache.org/job/Tajo-0.11.0-build/32/ ) TAJO-1600 : Invalid query planning for distinct group-by. (hyunsik: rev 7a1a8ce3c9648b67027b3714ad0eba3641d6ecc2) tajo-core-tests/src/test/resources/queries/TestCaseByCases/testTAJO_1600.sql tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/ProjectionPushDownRule.java tajo-plan/src/main/java/org/apache/tajo/plan/logical/GroupbyNode.java tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.result tajo-core-tests/src/test/java/org/apache/tajo/engine/query/TestCaseByCases.java tajo-plan/src/main/java/org/apache/tajo/plan/LogicalPlanner.java tajo-core-tests/src/test/resources/results/TestCaseByCases/testTAJO_1600.plan CHANGES

          People

          • Assignee:
            hyunsik Hyunsik Choi
            Reporter:
            jihoonson Jihoon Son
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development