Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.1.1
-
None
-
None
Description
Distinct count query performance is significantly improved due to HIVE-10568.
select count(distinct elevenst_id) from 11st.log_table where part_dt between '20180101' and '20180131'
However, some queries with several distinct counts are still slow. It starts with multiple mappers, but stuck in the last one reducer.
select count(distinct elevenst_id) , count(distinct member_id) , count(distinct user_id) , count(distinct action_id) , count(distinct other_id) from 11st.log_table where part_dt between '20180101' and '20180131'
Attachments
Issue Links
- relates to
-
HIVE-10568 Select count(distinct()) can have more optimal execution plan
- Closed