-
Type:
Improvement
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 2.1.1
-
Fix Version/s: None
-
Component/s: CBO, Logical Optimizer
-
Labels:None
Distinct count query performance is significantly improved due to HIVE-10568.
select count(distinct elevenst_id) from 11st.log_table where part_dt between '20180101' and '20180131'
However, some queries with several distinct counts are still slow. It starts with multiple mappers, but stuck in the last one reducer.
select count(distinct elevenst_id) , count(distinct member_id) , count(distinct user_id) , count(distinct action_id) , count(distinct other_id) from 11st.log_table where part_dt between '20180101' and '20180131'
- relates to
-
HIVE-10568 Select count(distinct()) can have more optimal execution plan
-
- Closed
-