Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17390

Select count(distinct) returns incorrect results using tez

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.2.1
    • None
    • Query Planning
    • None

    Description

      With the following combination of settings, select count(distinct) will return the results of select sum(distinct).
      hive.execution.engine=tez
      hive.optimize.reducededuplication=true
      hive.optimize.reducededuplication.min.reducer=1
      hive.optimize.distinct.rewrite=true
      hive.groupby.skewindata=false
      hive.vectorized.execution.reduce.enabled=true

      STEPS TO REPRODUCE:

      CREATE TABLE `simple_data`(ppmonth int, sale double);
      INSERT INTO simple_data VALUES (501,25000.0),(502,60000.0),(501,40000.0),(502,70000.0),(501,35000.0),(502,60000.0);
      set hive.execution.engine=tez;
      set hive.optimize.reducededuplication=true;
      set hive.optimize.reducededuplication.min.reducer=1;
      set hive.optimize.distinct.rewrite=true;
      set hive.groupby.skewindata=false;
      set hive.vectorized.execution.reduce.enabled=true;
      select count(distinct ppmonth) from simple_data;

      Returns 1003 rather than 2

      Attachments

        Activity

          People

            Unassigned Unassigned
            bgoerlitz Brian Goerlitz
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: