Description
When running SELECT DISTINCT with LIMIT it does full scan and aggregation (no pageFilter/limit used on server side),
this severely affects performance (query returns in 20sec vs 300ms without DISTINCT)
: jdbc:phoenix:localhost> explain select DISTINCT ROWKEY from TEST_1M LIMIT 100;
------------
PLAN |
------------
CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
SERVER FILTER BY FIRST KEY ONLY |
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [ROWKEY] |
CLIENT MERGE SORT |
CLIENT 100 ROW LIMIT |
------------
-------------------------------------------------
for comparison SELECT without DISTINCT uses a limit PageFilter=100 on server side and doesn't do full scan (query returns in 300ms)
explain select ROWKEY from TEST_1M LIMIT 100;
------------
PLAN |
------------
CLIENT PARALLEL 30-WAY FULL SCAN OVER TEST_1M |
SERVER FILTER BY FIRST KEY ONLY AND PageFilter 100 |
CLIENT MERGE SORT |
CLIENT 100 ROW LIMIT |
------------