Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
While investigating HIVE-24870, we found that during a long incremental replication, an SDS.CD_ID can improve the performance.
It was tested by postgres like below:
CREATE INDEX IF NOT EXISTS "SDS_N50" ON "SDS" USING btree ("CD_ID"); EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE; DROP INDEX IF EXISTS "SDS_N50"; EXPLAIN (ANALYZE,BUFFERS,TIMING) select count(*) from "SDS" where "CD_ID"=THE_MOST_FREQUENTLY_USED_CD_ID_HERE;
Further results can be found in: command-output.txt
After some investigation, I found that this index is also part of the schemas for a very long time:
orcale: HIVE-2928
mysql: HIVE-2246
mssql: HIVE-6862 (or earlier)
...except Postgres.