Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-2758

Ordered GROUP BY not occurring with leading PK equality expression

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.8.0
    • None
    • None

    Description

      The following query:

      SELECT SUM(DUP_COUNT) FROM (
          SELECT COUNT(1) As DUP_COUNT
          FROM DATACLOUD.DATA_ASSESSMENT_RECORD 
         WHERE JOURNEY_ID='07ixx000000004J' AND 
                       DATASOURCE=0 AND MATCH_STATUS <= 1 and 
                       ORGANIZATION_ID='07ixx000000004J' 
          GROUP BY MATCH_STATUS, EXTERNAL_DATASOURCE_KEY 
          HAVING COUNT(1) > 1);
      

      Should use an ORDERED DISTINCT, but doesn't.

      This is the DDL:

      CREATE TABLE DATACLOUD.DATA_ASSESSMENT_RECORD (ORGANIZATION_ID char(15) not null, 
      JOURNEY_ID char(15) not null, 
      DATASOURCE SMALLINT not null, 
      MATCH_STATUS TINYINT not null, 
      EXTERNAL_DATASOURCE_KEY varchar(30), 
      ENTITY_ID char(15) not null, 
      CONSTRAINT PK PRIMARY KEY (
          ORGANIZATION_ID, 
          JOURNEY_ID, 
          DATASOURCE, 
          MATCH_STATUS,
          EXTERNAL_DATASOURCE_KEY,
          ENTITY_ID))
      

      The optimization does occur if you include the PK columns in the GROUP BY like this:

      SELECT SUM(DUP_COUNT) FROM (
          SELECT COUNT(1) As DUP_COUNT
          FROM DATACLOUD.DATA_ASSESSMENT_RECORD 
         WHERE JOURNEY_ID='07ixx000000004J' AND 
                       DATASOURCE=0 AND MATCH_STATUS <= 1 and 
                       ORGANIZATION_ID='07ixx000000004J' 
          GROUP BY ORGANIZATION_ID, JOURNEY_ID, DATASOURCE, MATCH_STATUS, EXTERNAL_DATASOURCE_KEY 
          HAVING COUNT(1) > 1);
      

      Attachments

        1. PHOENIX-2758_wip.patch
          6 kB
          James R. Taylor
        2. PHOENIX-2758.patch
          20 kB
          James R. Taylor

        Activity

          Would you mind reviewing, samarthjain? The problem was that we were determining whether or not a GROUP BY was in row key order before compiling the WHERE clause. We instead need to do this in two steps: first compile the GROUP BY without regard to ordering (so we can also optimize the HAVING). Then recompile the GROUP BY to take into account ordering.

          jamestaylor James R. Taylor added a comment - Would you mind reviewing, samarthjain ? The problem was that we were determining whether or not a GROUP BY was in row key order before compiling the WHERE clause. We instead need to do this in two steps: first compile the GROUP BY without regard to ordering (so we can also optimize the HAVING). Then recompile the GROUP BY to take into account ordering.
          samarthjain Samarth Jain added a comment -

          +1, looks good.

          samarthjain Samarth Jain added a comment - +1, looks good.
          hudson Hudson added a comment -

          FAILURE: Integrated in Phoenix-master #1181 (See https://builds.apache.org/job/Phoenix-master/1181/)
          PHOENIX-2758 Ordered GROUP BY not occurring with leading PK equality (jtaylor: rev 838a60b9abfb9b65363db9e51cdb6bc32d2088c9)

          • phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.java
          • phoenix-core/src/main/java/org/apache/phoenix/compile/GroupByCompiler.java
          • phoenix-core/src/test/java/org/apache/phoenix/compile/QueryCompilerTest.java
          • phoenix-core/src/it/java/org/apache/phoenix/end2end/GroupByCaseIT.java
          hudson Hudson added a comment - FAILURE: Integrated in Phoenix-master #1181 (See https://builds.apache.org/job/Phoenix-master/1181/ ) PHOENIX-2758 Ordered GROUP BY not occurring with leading PK equality (jtaylor: rev 838a60b9abfb9b65363db9e51cdb6bc32d2088c9) phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.java phoenix-core/src/main/java/org/apache/phoenix/compile/GroupByCompiler.java phoenix-core/src/test/java/org/apache/phoenix/compile/QueryCompilerTest.java phoenix-core/src/it/java/org/apache/phoenix/end2end/GroupByCaseIT.java
          ankit@apache.org Ankit Singhal added a comment -

          Bulk close of all issues that has been resolved in a released version.

          ankit@apache.org Ankit Singhal added a comment - Bulk close of all issues that has been resolved in a released version.

          People

            jamestaylor James R. Taylor
            jamestaylor James R. Taylor
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: