Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4582

Query with count distinct and grouping sets get wrong result

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • v2.5.2
    • v3.1.4
    • Query Engine
    • None
    • hdp 2.6

    Description

      We find that query with count distinct and grouping sets get wrong result when number of  grouping sets combination > 2;

      if we change count distinct to count or sum the result is correct.

      Query result is correct when grouping have two sets as below:

      group by grouping sets(
      (a.period_id,a.company_name,a.province,a.new_flag),
      (a.period_id,a.company_name,a.new_flag))
      

      full sql:

      select period_id,
             case grouping(a.company_name) when 1 then 'ALL' else a.company_name end company_name,
             case grouping(a.province) when 1 then 'ALL' else a.province end province,
             case grouping(a.new_flag) when 1 then 'ALL' else a.new_flag end new_flag,
             count(distinct user_id) as index2  --'users'
      from (
      select monthid as period_id,
             case company_id when '1' then 'xx1' when '2' then 'xx2'
             end as company_name,
             province_name as province,
             case new_user_flag when '1' then '新增' when '2' then '存量' when '3' then '存量' end as new_flag,
             user_id
      from rpt_culture_order_m
      where monthid between '202005' and '{202005}' and order_flag=2 and company_id=4
      ) a 
      group by grouping sets(
      (a.period_id,a.company_name,a.province,a.new_flag),
      (a.period_id,a.company_name,a.new_flag))
      

      But if we add one or more grouping the query result (count distinct(user_id)) is wrong.

      group by grouping sets(
      (a.period_id,a.company_name,a.province,a.new_flag),
      (a.period_id,a.company_name,a.new_flag),
      (a.period_id,a.company_name))
      

      full sql:

      select period_id,
             case grouping(a.company_name) when 1 then 'ALL' else a.company_name end company_name,
             case grouping(a.province) when 1 then 'ALL' else a.province end province,
             case grouping(a.new_flag) when 1 then 'ALL' else a.new_flag end new_flag,
             count(distinct user_id) as index2  --'users'
      from (
      select monthid as period_id,
             case company_id when '1' then 'xx1' when '2' then 'xx2'
             end as company_name,
             province_name as province,
             case new_user_flag when '1' then '新增' when '2' then '存量' when '3' then '存量' end as new_flag,
             user_id
      from rpt_culture_order_m
      where monthid between '202005' and '{202005}' and order_flag=2 and company_id=4
      ) a 
      group by grouping sets(
      (a.period_id,a.company_name,a.province,a.new_flag),
      (a.period_id,a.company_name,a.new_flag),
      (a.period_id,a.company_name)
      )
      

       

      The result of '新增‘ (column :INDEX2) should be 3170752 instead of 34808840 which is equal with the result of 'all'.

      count distinct result of  "All" dimension =  roll up dimension.

      Further, we have test that when change the aggregation from count distinct to count or sum the result is right.

      Attachments

        1. count_distinct_grouping_sets_wrong_result.png
          77 kB
          YaoChen
        2. right_result.png
          74 kB
          YaoChen

        Issue Links

          Activity

            People

              Unassigned Unassigned
              flyao YaoChen
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: