Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2750

Hive multi group by single reducer optimization causes invalid column reference error

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.

      E.g.
      FROM src
      INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
      INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);

      This results in an invalid column reference error on src.value

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kevinwilfong Kevin Wilfong
                Reporter:
                kevinwilfong Kevin Wilfong
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: