Hive
  1. Hive
  2. HIVE-2750

Hive multi group by single reducer optimization causes invalid column reference error

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.

      E.g.
      FROM src
      INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
      INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);

      This results in an invalid column reference error on src.value

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        33m 38s 1 Kevin Wilfong 26/Jan/12 02:42
        Patch Available Patch Available Resolved Resolved
        4d 7h 10m 1 Amareshwari Sriramadasu 30/Jan/12 09:53
        Resolved Resolved Closed Closed
        91d 11h 18m 1 Ashutosh Chauhan 30/Apr/12 22:11
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-2750 Hive multi group by single reducer optimization causes invalid column
        reference error (Kevin Wilfong via namit) (Revision 1236150)

        Result = ABORTED
        namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236150
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
        • /hive/trunk/ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q
        • /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-2750 Hive multi group by single reducer optimization causes invalid column reference error (Kevin Wilfong via namit) (Revision 1236150) Result = ABORTED namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236150 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
        Ashutosh Chauhan made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Ashutosh Chauhan added a comment -

        This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.

        Show
        Ashutosh Chauhan added a comment - This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.
        Amareshwari Sriramadasu made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.9.0 [ 12317742 ]
        Resolution Fixed [ 1 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        Seems the issue missed resolution. Resolving.

        Show
        Amareshwari Sriramadasu added a comment - Seems the issue missed resolution. Resolving.
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1222 (See https://builds.apache.org/job/Hive-trunk-h0.21/1222/)
        HIVE-2750 Hive multi group by single reducer optimization causes invalid column
        reference error (Kevin Wilfong via namit)

        namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236150
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
        • /hive/trunk/ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q
        • /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1222 (See https://builds.apache.org/job/Hive-trunk-h0.21/1222/ ) HIVE-2750 Hive multi group by single reducer optimization causes invalid column reference error (Kevin Wilfong via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236150 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
        Hide
        Namit Jain added a comment -

        +1

        Show
        Namit Jain added a comment - +1
        Phabricator made changes -
        Attachment HIVE-2750.D1455.1.patch [ 12511935 ]
        Hide
        Phabricator added a comment -

        kevinwilfong requested code review of "HIVE-2750 [jira] Hive multi group by single reducer optimization causes invalid column reference error".
        Reviewers: JIRA

        When generating the list of value columns for the reduce sink operator, in the case of multiple group bys occurring in the same reducer, only the columns used by the first query block was being considered, due to a typo. This patch fixes this typo, and adds a testcase to ensure the error does not reoccur.

        After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.

        E.g.
        FROM src
        INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
        INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);

        This results in an invalid column reference error on src.value

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D1455

        AFFECTED FILES
        ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
        ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q
        ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java

        MANAGE HERALD DIFFERENTIAL RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/3015/

        Tip: use the X-Herald-Rules header to filter Herald messages in your client.

        Show
        Phabricator added a comment - kevinwilfong requested code review of " HIVE-2750 [jira] Hive multi group by single reducer optimization causes invalid column reference error". Reviewers: JIRA When generating the list of value columns for the reduce sink operator, in the case of multiple group bys occurring in the same reducer, only the columns used by the first query block was being considered, due to a typo. This patch fixes this typo, and adds a testcase to ensure the error does not reoccur. After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block. E.g. FROM src INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1) INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1); This results in an invalid column reference error on src.value TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D1455 AFFECTED FILES ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3015/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
        Kevin Wilfong made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Kevin Wilfong created issue -

          People

          • Assignee:
            Kevin Wilfong
            Reporter:
            Kevin Wilfong
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development