Hive
  1. Hive
  2. HIVE-5083

Group by ignored when group by column is a partition column

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: 0.11.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      linux

      Description

      I have an external table X with partition date (a string YYYYMMDD):

      select X.date, count from X group by X.date

      Rather then get a count breakdown by date, I get a single row returned with the count for the entire table. The "date" column returned in my single row appears to be the last partition in the table.

      Note results appear as expected if I select an arbitrary "real" column from my table:

      select X.foo, count from X group by X.foo

      correctly gives me a single row per value of X.foo.

      Also, my query works fine when I use the date column in the "where" clause, so the partition does seem to be working.

      select X.date, count from X where X.date = "20130101"

      correctly gives me a single row with the count for the date 20130101.

        Activity

        Hide
        Micah Gutman added a comment -

        The reported problem is just a symptom of a different known bug.

        Show
        Micah Gutman added a comment - The reported problem is just a symptom of a different known bug.
        Hide
        Micah Gutman added a comment -

        Finally found the bug by using "show extended <table> <partition spec>" to figure out that all partitions were pointing to a single file. My selects only looked like they were working, they were just reading the same data over and over.

        Specifically, I created my partitions with "alter table" using multiple partition specs in the same command. Interestingly, the wiki page help said:

        Note that it is proper syntax to have multiple partition_spec in a single ALTER TABLE, but if you do this in version 0.7, your partitioning scheme will fail. That is, every query specifying a partition will always use only the first partition.

        I am using 0.11, not 0.7. Apparently, 0.11 (and perhaps everything after 0.7?) has this problem.

        Show
        Micah Gutman added a comment - Finally found the bug by using "show extended <table> <partition spec>" to figure out that all partitions were pointing to a single file. My selects only looked like they were working, they were just reading the same data over and over. Specifically, I created my partitions with "alter table" using multiple partition specs in the same command. Interestingly, the wiki page help said: Note that it is proper syntax to have multiple partition_spec in a single ALTER TABLE, but if you do this in version 0.7, your partitioning scheme will fail. That is, every query specifying a partition will always use only the first partition. I am using 0.11, not 0.7. Apparently, 0.11 (and perhaps everything after 0.7?) has this problem.

          People

          • Assignee:
            Unassigned
            Reporter:
            Micah Gutman
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development