Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9347

Bug with max() together with rank() and grouping sets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.13.1, 0.14.0
    • 1.1.0
    • Query Processor
    • None
    • Amazon Elastic Map Reduce, AMI 3.3.1, Hadoop Amazon 2.4.0, Hive 0.13.1

    Description

      It looks like the query below returns incorrect results on Hive 0.13.1, but it was working fine on Hive 0.11.

      I have the following table:
      CREATE TABLE `t`(
      `category` int,
      `live` int,
      `comments` int)

      with the following data:
      hive> select * from t;
      OK
      3 0 2
      2 0 2
      8 0 2

      The query:
      hive> select category, max(live) live, max(comments) comments, rank() OVER (PARTITION BY category ORDER BY comments) rank1
      FROM t
      GROUP BY category
      GROUPING SETS ((), (category))
      HAVING max(comments) > 0;

      return the following results:

      NULL 1 48 1
      2 1 49 1
      3 1 49 1
      8 1 49 1

      When using grouping sets with the rank() function the max() function return incorrect results. Everything works fine if I remove grouping sets clause and split the query into two independent queries or remove the rank() function.

      This looks like a bug to me but please review. That said, I'm not sure if it's just Amazon issue or general Hive issue.

      Attachments

        1. HIVE-9347.1.patch.txt
          45 kB
          Navis Ryu
        2. HIVE-9347.2.patch.txt
          75 kB
          Navis Ryu
        3. HIVE-9347.3.patch.txt
          78 kB
          Navis Ryu

        Issue Links

          Activity

            People

              navis Navis Ryu
              amk Michal Krawczyk
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: