Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19283

Select count(distinct()) a couple of times stuck in last reducer

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.1
    • Fix Version/s: None
    • Component/s: CBO, Logical Optimizer
    • Labels:
      None

      Description

       Distinct count query performance is significantly improved due to HIVE-10568

      select count(distinct elevenst_id)
      from 11st.log_table
      where part_dt between '20180101' and '20180131'

       

      However, some queries with several distinct counts are still slow. It starts with multiple mappers, but stuck in the last one reducer. 

      select 
        count(distinct elevenst_id)
      , count(distinct member_id)
      , count(distinct user_id)
      , count(distinct action_id)
      , count(distinct other_id)
       from 11st.log_table
      where part_dt between '20180101' and '20180131'

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ashutoshc Ashutosh Chauhan
                Reporter:
                goun Goun Na
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: