Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
The following query:
SELECT SUM(DUP_COUNT) FROM ( SELECT COUNT(1) As DUP_COUNT FROM DATACLOUD.DATA_ASSESSMENT_RECORD WHERE JOURNEY_ID='07ixx000000004J' AND DATASOURCE=0 AND MATCH_STATUS <= 1 and ORGANIZATION_ID='07ixx000000004J' GROUP BY MATCH_STATUS, EXTERNAL_DATASOURCE_KEY HAVING COUNT(1) > 1);
Should use an ORDERED DISTINCT, but doesn't.
This is the DDL:
CREATE TABLE DATACLOUD.DATA_ASSESSMENT_RECORD (ORGANIZATION_ID char(15) not null, JOURNEY_ID char(15) not null, DATASOURCE SMALLINT not null, MATCH_STATUS TINYINT not null, EXTERNAL_DATASOURCE_KEY varchar(30), ENTITY_ID char(15) not null, CONSTRAINT PK PRIMARY KEY ( ORGANIZATION_ID, JOURNEY_ID, DATASOURCE, MATCH_STATUS, EXTERNAL_DATASOURCE_KEY, ENTITY_ID))
The optimization does occur if you include the PK columns in the GROUP BY like this:
SELECT SUM(DUP_COUNT) FROM ( SELECT COUNT(1) As DUP_COUNT FROM DATACLOUD.DATA_ASSESSMENT_RECORD WHERE JOURNEY_ID='07ixx000000004J' AND DATASOURCE=0 AND MATCH_STATUS <= 1 and ORGANIZATION_ID='07ixx000000004J' GROUP BY ORGANIZATION_ID, JOURNEY_ID, DATASOURCE, MATCH_STATUS, EXTERNAL_DATASOURCE_KEY HAVING COUNT(1) > 1);