Hive
  1. Hive
  2. HIVE-784

Support uncorrelated subqueries in the WHERE clause

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: Query Processor
    • Labels:
      None
    • Release Note:
      Not ready for release

      Description

      Hive currently only support views in the FROM-clause, some Facebook use cases suggest that Hive should support subqueries such as those connected by IN/EXISTS in the WHERE-clause.

      1. D13443.1.patch
        185 kB
        Phabricator
      2. D13443.2.patch
        192 kB
        Phabricator
      3. HIVE-784.1.patch.txt
        26 kB
        Matthew Weaver
      4. HIVE-784.2.patch
        187 kB
        Harish Butani
      5. SubQuerySpec.pdf
        337 kB
        Harish Butani
      6. tpchQueriesUsingSubQueryClauses.sql
        6 kB
        Harish Butani

        Issue Links

          Activity

          Hide
          Ning Zhang added a comment -

          Subqueries can be correlated or uncorrelated to its outer queries. If it is uncorrelated, the subqueries can be treated as a constant (set of rows in the case of IN, or boolean in the case of EXISTS). It is the easiest case where subqueries can be evaluated once and results don't change. It is the correlated subquries that are more common and much more complex.

          In general, correlated subqueries can be treated as a function, where the correlated variables can be treated as the input parameters. For example, the following nested subquery

          select *
          from A
          where exists (
          select null
          from B
          where B.id = A.id
          and B.date > '2009-10-01')

          can be treated as a function where the passing parameter is the correlated variable A.id. A native plan is to evaluate this function for every row in A. This is equivalent to a nested-loop join (semi) between A and B. A better evaluation plan is to evaluate the subquery every time the input parameter changes value. A better generalization is to rewrite the whole query to unnest the subquery into a semi-join between A and B. Then there are more join algorithms to choose from. A whole body of database research papers are dedicated to rewrite rules on unnesting subqueries to joins.

          There are also cases that the nested subqueries cannot be unnested into joins, particularly for those subqueries involving aggretations.

          As the first step, we will working on cases where subquries are uncorrelated or they can be unnested into semi-joins.

          Show
          Ning Zhang added a comment - Subqueries can be correlated or uncorrelated to its outer queries. If it is uncorrelated, the subqueries can be treated as a constant (set of rows in the case of IN, or boolean in the case of EXISTS). It is the easiest case where subqueries can be evaluated once and results don't change. It is the correlated subquries that are more common and much more complex. In general, correlated subqueries can be treated as a function, where the correlated variables can be treated as the input parameters. For example, the following nested subquery select * from A where exists ( select null from B where B.id = A.id and B.date > '2009-10-01') can be treated as a function where the passing parameter is the correlated variable A.id. A native plan is to evaluate this function for every row in A. This is equivalent to a nested-loop join (semi) between A and B. A better evaluation plan is to evaluate the subquery every time the input parameter changes value. A better generalization is to rewrite the whole query to unnest the subquery into a semi-join between A and B. Then there are more join algorithms to choose from. A whole body of database research papers are dedicated to rewrite rules on unnesting subqueries to joins. There are also cases that the nested subqueries cannot be unnested into joins, particularly for those subqueries involving aggretations. As the first step, we will working on cases where subquries are uncorrelated or they can be unnested into semi-joins.
          Hide
          Ning Zhang added a comment -

          Created a new task HIVE-870 for semi-join. Hive users should be able to specify semi-join in their queries just as INNER/OUTER joins.

          Show
          Ning Zhang added a comment - Created a new task HIVE-870 for semi-join. Hive users should be able to specify semi-join in their queries just as INNER/OUTER joins.
          Hide
          Guido Serra aka Zeph added a comment -

          Ning Zhang does it mean that u are happy with the workaround and this ticket will not be closed? cause at Rocket Internet we do have the same issue u stepped into... and the semi-join is not an option to us. Is anyone else going to take ownership of this one, or since u are the initial opener and the assignee it is going to /dev/null? (as it seems, since the ticket is 4 years old)

          Namit Jain is there a board reviewing the ticket status? or whoever opens a ticket is usually the one pushing for its closing/fixing and if this figure is missing features are stalling like this one?

          Show
          Guido Serra aka Zeph added a comment - Ning Zhang does it mean that u are happy with the workaround and this ticket will not be closed? cause at Rocket Internet we do have the same issue u stepped into... and the semi-join is not an option to us. Is anyone else going to take ownership of this one, or since u are the initial opener and the assignee it is going to /dev/null? (as it seems, since the ticket is 4 years old) Namit Jain is there a board reviewing the ticket status? or whoever opens a ticket is usually the one pushing for its closing/fixing and if this figure is missing features are stalling like this one?
          Hide
          Ning Zhang added a comment -

          Guido, I'm not actively working on this now. Semi-join is usually we face and it could handle most of our cases. Feel free to grab this task if you would like to work on it.

          Show
          Ning Zhang added a comment - Guido, I'm not actively working on this now. Semi-join is usually we face and it could handle most of our cases. Feel free to grab this task if you would like to work on it.
          Hide
          Sun Rui added a comment -

          Guido, you may try project-panthera-ase@github, which is an open source effort for HIVE-3472. It can support correlated/un-correlated subqueries and selection from multiple tables without any join operator now. Instead of directly modifying the HiveQL grammar and query processor to support subqueries, the project incorporates a PL/SQL parser for SQL input and performs semantically identical transformation on the PL/SQL parser output for execution by the query processor.

          Show
          Sun Rui added a comment - Guido, you may try project-panthera-ase@github, which is an open source effort for HIVE-3472 . It can support correlated/un-correlated subqueries and selection from multiple tables without any join operator now. Instead of directly modifying the HiveQL grammar and query processor to support subqueries, the project incorporates a PL/SQL parser for SQL input and performs semantically identical transformation on the PL/SQL parser output for execution by the query processor.
          Hide
          Brock Noland added a comment -

          Hi,

          I see Ning is not working on this at present. Is anyone else working on this?

          Brock

          Show
          Brock Noland added a comment - Hi, I see Ning is not working on this at present. Is anyone else working on this? Brock
          Hide
          Matthew Weaver added a comment -

          I'm looking into uncorrelated subqueries. Will pick this up unless someone else is actively working it.

          Functional Requirements
          • Support WHERE x IN (<column subquery>);
            • <column subquery> returns one column, any number of rows.
          • Support WHERE x NOT IN (<column subquery>);
          • Support same types of subqueries in HAVING.
            • E.g.
              SELECT key FROM t1 GROUP BY key 
              HAVING COUNT(value) IN (SELECT p FROM t2);  
          • Correlated subqueries not supported, for now at least
            • But still need to check for correlation, and bail if it occurs.
            • Correlated subquery:
              • A subquery that references a table that appears in a containing query (MySQL), thus requiring subquery evaluation to look outside its scope.
              • The subquery depends on the outer query for its values, so the subquery must be executed once for each row of the outer query. Also known as repeating Subqueries.
          Tasks
          • Rewrite IN (<column-subquery>) as a LEFT SEMI JOIN.
            • Not ready for public consumption. In particular, no check for correlated terms.
            • With test queries.
          • Add check for correlated terms, return informative error message.
          • Rewrite WHERE NOT IN (<column-subquery>) as a LEFT OUTER JOIN.
            • Return rows that don't match the right side
          • Rewrite subqueries in HAVING, using LEFT SEMI JOIN and LEFT OUTER JOIN as above.
          Show
          Matthew Weaver added a comment - I'm looking into uncorrelated subqueries. Will pick this up unless someone else is actively working it. Functional Requirements Support WHERE x IN (<column subquery>); <column subquery> returns one column, any number of rows. Support WHERE x NOT IN (<column subquery>) ; Support same types of subqueries in HAVING . E.g. SELECT key FROM t1 GROUP BY key HAVING COUNT(value) IN ( SELECT p FROM t2); Correlated subqueries not supported, for now at least But still need to check for correlation, and bail if it occurs. Correlated subquery: A subquery that references a table that appears in a containing query ( MySQL ), thus requiring subquery evaluation to look outside its scope. The subquery depends on the outer query for its values, so the subquery must be executed once for each row of the outer query. Also known as repeating Subqueries . Tasks Rewrite IN (<column-subquery>) as a LEFT SEMI JOIN . Not ready for public consumption. In particular, no check for correlated terms. With test queries. Add check for correlated terms, return informative error message. Rewrite WHERE NOT IN (<column-subquery>) as a LEFT OUTER JOIN . Return rows that don't match the right side Rewrite subqueries in HAVING , using LEFT SEMI JOIN and LEFT OUTER JOIN as above.
          Hide
          Matthew Weaver added a comment -

          Support subqueries for 'WHERE IN (<subquery>)', initial version.

          This is an initial implementation, not ready for submission (details below).
          Nevertheless I would welcome comments on the approach.

          • ASTRewriter.inSubqueryToJoin(). Currently is called right after
            parsing, but will probably have to move later for proper alias
            generation.
          • This early version hard-codes a generated alias "t2" for subquery. To
            fix, with guarantee of unique alias, will require moving the rewrite
            into SemanticAnalyzer.analyzeInternal() after getMetaData().
          • Currently only handles a top-level occurrence of WHERE IN
            (<subquery>). Any nested occurrences will fail.
          Show
          Matthew Weaver added a comment - Support subqueries for 'WHERE IN (<subquery>)', initial version. This is an initial implementation, not ready for submission (details below). Nevertheless I would welcome comments on the approach. ASTRewriter.inSubqueryToJoin(). Currently is called right after parsing, but will probably have to move later for proper alias generation. This early version hard-codes a generated alias "t2" for subquery. To fix, with guarantee of unique alias, will require moving the rewrite into SemanticAnalyzer.analyzeInternal() after getMetaData(). Currently only handles a top-level occurrence of WHERE IN (<subquery>). Any nested occurrences will fail.
          Hide
          Matthew Weaver added a comment -

          Initival version.
          On Phabricator at https://reviews.facebook.net/D11541.

          Show
          Matthew Weaver added a comment - Initival version. On Phabricator at https://reviews.facebook.net/D11541 .
          Hide
          Navis added a comment -

          Added some comments

          Show
          Navis added a comment - Added some comments
          Hide
          Harish Butani added a comment -

          Uploaded a Spec and preliminary patch.
          Has support for:

          • In/Not In: non-correlated and correlated
          • Exists/Not Exists

          The approach is to do Algebraic transformations during genPlan.

          The patch doesn't have all the checks from the document. For e.g. check to disallow nested SubQueries is missing.
          Will add these and more egs in the next patch.

          Show
          Harish Butani added a comment - Uploaded a Spec and preliminary patch. Has support for: In/Not In: non-correlated and correlated Exists/Not Exists The approach is to do Algebraic transformations during genPlan. The patch doesn't have all the checks from the document. For e.g. check to disallow nested SubQueries is missing. Will add these and more egs in the next patch.
          Hide
          Harish Butani added a comment -

          Just uploaded a script with tpch queries Q4, Q15, Q16 and Q18 written using SubQueries.
          Validated results against what is in the Spec.
          Script includes ddl at the bottom.

          Show
          Harish Butani added a comment - Just uploaded a script with tpch queries Q4, Q15, Q16 and Q18 written using SubQueries. Validated results against what is in the Spec. Script includes ddl at the bottom.
          Hide
          Ashutosh Chauhan added a comment -

          Harish Butani Can you post a review board / phabricator entry for patch. It will be easier to review that way.

          Show
          Ashutosh Chauhan added a comment - Harish Butani Can you post a review board / phabricator entry for patch. It will be easier to review that way.
          Hide
          Phabricator added a comment -

          hbutani requested code review of "HIVE-784 [jira] Support uncorrelated subqueries in the WHERE clause".

          Reviewers: JIRA, ashutoshc

          SubQuery: add tests for distinct

          Hive currently only support views in the FROM-clause, some Facebook use cases suggest that Hive should support subqueries such as those connected by IN/EXISTS in the WHERE-clause.

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D13443

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
          ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
          ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
          ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java
          ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
          ql/src/test/queries/clientnegative/subquery_in_groupby.q
          ql/src/test/queries/clientnegative/subquery_in_select.q
          ql/src/test/queries/clientnegative/subquery_windowing_corr.q
          ql/src/test/queries/clientpositive/subquery_exists.q
          ql/src/test/queries/clientpositive/subquery_in.q
          ql/src/test/queries/clientpositive/subquery_multiinsert.q
          ql/src/test/queries/clientpositive/subquery_notexists.q
          ql/src/test/queries/clientpositive/subquery_notin.q
          ql/src/test/results/clientnegative/subquery_in_groupby.q.out
          ql/src/test/results/clientnegative/subquery_in_select.q.out
          ql/src/test/results/clientnegative/subquery_windowing_corr.q.out
          ql/src/test/results/clientpositive/subquery_exists.q.out
          ql/src/test/results/clientpositive/subquery_in.q.out
          ql/src/test/results/clientpositive/subquery_multiinsert.q.out
          ql/src/test/results/clientpositive/subquery_notexists.q.out
          ql/src/test/results/clientpositive/subquery_notin.q.out

          MANAGE HERALD RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/40029/

          To: JIRA, ashutoshc, hbutani

          Show
          Phabricator added a comment - hbutani requested code review of " HIVE-784 [jira] Support uncorrelated subqueries in the WHERE clause". Reviewers: JIRA, ashutoshc SubQuery: add tests for distinct Hive currently only support views in the FROM-clause, some Facebook use cases suggest that Hive should support subqueries such as those connected by IN/EXISTS in the WHERE-clause. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D13443 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ql/src/test/queries/clientnegative/subquery_in_groupby.q ql/src/test/queries/clientnegative/subquery_in_select.q ql/src/test/queries/clientnegative/subquery_windowing_corr.q ql/src/test/queries/clientpositive/subquery_exists.q ql/src/test/queries/clientpositive/subquery_in.q ql/src/test/queries/clientpositive/subquery_multiinsert.q ql/src/test/queries/clientpositive/subquery_notexists.q ql/src/test/queries/clientpositive/subquery_notin.q ql/src/test/results/clientnegative/subquery_in_groupby.q.out ql/src/test/results/clientnegative/subquery_in_select.q.out ql/src/test/results/clientnegative/subquery_windowing_corr.q.out ql/src/test/results/clientpositive/subquery_exists.q.out ql/src/test/results/clientpositive/subquery_in.q.out ql/src/test/results/clientpositive/subquery_multiinsert.q.out ql/src/test/results/clientpositive/subquery_notexists.q.out ql/src/test/results/clientpositive/subquery_notin.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/40029/ To: JIRA, ashutoshc, hbutani
          Hide
          Phabricator added a comment -

          ashutoshc has requested changes to the revision "HIVE-784 [jira] Support uncorrelated subqueries in the WHERE clause".

          Design looks good. Mostly implementation related comments.

          INLINE COMMENTS
          ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g:391 It would be nicer if instead of two rules for IN / NOT IN if we could just have one rule, which can conditionally generate TOK_SUBQUERY_OP_NOTIN / TOK_SUBQUERY_OP_IN token. Not a big deal, but would be nice to have since that makes grammar bit more succinct.
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1804 You mentioned in comment above that we don't support nested / recursive subq, but I don't see a check for that. Perhaps, its there but I missed it.
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1789 Thanks for detailed comments. Very helpful!
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1811 There should exactly one subq currently. If so, will be good to add a note for it.
          ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java:101 Since, OR is not supported, It will be good to generate an error message here if OR is encountered.
          ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java:226-233 Same logic exists in SemanticAnalyzer::doPhase1GetAllAggregations, perhaps we can create a util method in ParseUtils, instead of repeating code here.
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:289 If you mark this as transient, you probably wont need to write Kryo serializer for this.
          ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:699 I dont think its required. We should probably mark all usage of instances of ASTNodeOrigin as transient.
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:470 This should never be the case. Shall we throw an exception here, instead of silently returning?
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:496 Is this allowed by standard that subq predicate may refer to Outer? If yes, than in future perhaps we can add this predicate as a conjunct for outer query.
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:537 Is this need to be .equals() check here instead of == ?
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:539 It will be good to add a comment, why we need to have True condition here, instead? Probably, because plan gen fails later while generating rest of filter plan.
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1821 Good to name this method as validateAndRewriteAST() ?
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6526 Quite a bit of this code is repeated from genJoinTree(), seems like atleast some bits could be refactored out of genJoinTree() which this method can make use of.
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1827 name sqOperator is misleading here, topOp perhaps ?
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:41 This is not an operator in classic Hive sense. Perhaps, SubqASTcontainer or something else.
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:24 SubQueryType instead of SubQueryOperatorType ?

          REVISION DETAIL
          https://reviews.facebook.net/D13443

          BRANCH
          SubQuery

          ARCANIST PROJECT
          hive

          To: JIRA, ashutoshc, hbutani

          Show
          Phabricator added a comment - ashutoshc has requested changes to the revision " HIVE-784 [jira] Support uncorrelated subqueries in the WHERE clause". Design looks good. Mostly implementation related comments. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g:391 It would be nicer if instead of two rules for IN / NOT IN if we could just have one rule, which can conditionally generate TOK_SUBQUERY_OP_NOTIN / TOK_SUBQUERY_OP_IN token. Not a big deal, but would be nice to have since that makes grammar bit more succinct. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1804 You mentioned in comment above that we don't support nested / recursive subq, but I don't see a check for that. Perhaps, its there but I missed it. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1789 Thanks for detailed comments. Very helpful! ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1811 There should exactly one subq currently. If so, will be good to add a note for it. ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java:101 Since, OR is not supported, It will be good to generate an error message here if OR is encountered. ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java:226-233 Same logic exists in SemanticAnalyzer::doPhase1GetAllAggregations, perhaps we can create a util method in ParseUtils, instead of repeating code here. ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:289 If you mark this as transient, you probably wont need to write Kryo serializer for this. ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:699 I dont think its required. We should probably mark all usage of instances of ASTNodeOrigin as transient. ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:470 This should never be the case. Shall we throw an exception here, instead of silently returning? ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:496 Is this allowed by standard that subq predicate may refer to Outer? If yes, than in future perhaps we can add this predicate as a conjunct for outer query. ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:537 Is this need to be .equals() check here instead of == ? ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:539 It will be good to add a comment, why we need to have True condition here, instead? Probably, because plan gen fails later while generating rest of filter plan. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1821 Good to name this method as validateAndRewriteAST() ? ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6526 Quite a bit of this code is repeated from genJoinTree(), seems like atleast some bits could be refactored out of genJoinTree() which this method can make use of. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1827 name sqOperator is misleading here, topOp perhaps ? ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:41 This is not an operator in classic Hive sense. Perhaps, SubqASTcontainer or something else. ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:24 SubQueryType instead of SubQueryOperatorType ? REVISION DETAIL https://reviews.facebook.net/D13443 BRANCH SubQuery ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani
          Hide
          Phabricator added a comment -

          hbutani updated the revision "HIVE-784 [jira] Support uncorrelated subqueries in the WHERE clause".

          Reviewers: ashutoshc, JIRA

          REVISION DETAIL
          https://reviews.facebook.net/D13443

          CHANGE SINCE LAST DIFF
          https://reviews.facebook.net/D13443?vs=41457&id=41871#toc

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
          ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
          ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java
          ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
          ql/src/test/queries/clientnegative/subquery_exists_implicit_gby.q
          ql/src/test/queries/clientnegative/subquery_in_groupby.q
          ql/src/test/queries/clientnegative/subquery_in_select.q
          ql/src/test/queries/clientnegative/subquery_multiple_cols_in_select.q
          ql/src/test/queries/clientnegative/subquery_notexists_implicit_gby.q
          ql/src/test/queries/clientnegative/subquery_subquery_chain.q
          ql/src/test/queries/clientnegative/subquery_windowing_corr.q
          ql/src/test/queries/clientnegative/subquery_with_or_cond.q
          ql/src/test/queries/clientpositive/subquery_exists.q
          ql/src/test/queries/clientpositive/subquery_in.q
          ql/src/test/queries/clientpositive/subquery_multiinsert.q
          ql/src/test/queries/clientpositive/subquery_notexists.q
          ql/src/test/queries/clientpositive/subquery_notin.q
          ql/src/test/results/clientnegative/subquery_exists_implicit_gby.q.out
          ql/src/test/results/clientnegative/subquery_in_groupby.q.out
          ql/src/test/results/clientnegative/subquery_in_select.q.out
          ql/src/test/results/clientnegative/subquery_multiple_cols_in_select.q.out
          ql/src/test/results/clientnegative/subquery_notexists_implicit_gby.q.out
          ql/src/test/results/clientnegative/subquery_subquery_chain.q.out
          ql/src/test/results/clientnegative/subquery_windowing_corr.q.out
          ql/src/test/results/clientnegative/subquery_with_or_cond.q.out
          ql/src/test/results/clientpositive/subquery_exists.q.out
          ql/src/test/results/clientpositive/subquery_in.q.out
          ql/src/test/results/clientpositive/subquery_multiinsert.q.out
          ql/src/test/results/clientpositive/subquery_notexists.q.out
          ql/src/test/results/clientpositive/subquery_notin.q.out

          To: JIRA, ashutoshc, hbutani

          Show
          Phabricator added a comment - hbutani updated the revision " HIVE-784 [jira] Support uncorrelated subqueries in the WHERE clause". Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D13443 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D13443?vs=41457&id=41871#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ql/src/test/queries/clientnegative/subquery_exists_implicit_gby.q ql/src/test/queries/clientnegative/subquery_in_groupby.q ql/src/test/queries/clientnegative/subquery_in_select.q ql/src/test/queries/clientnegative/subquery_multiple_cols_in_select.q ql/src/test/queries/clientnegative/subquery_notexists_implicit_gby.q ql/src/test/queries/clientnegative/subquery_subquery_chain.q ql/src/test/queries/clientnegative/subquery_windowing_corr.q ql/src/test/queries/clientnegative/subquery_with_or_cond.q ql/src/test/queries/clientpositive/subquery_exists.q ql/src/test/queries/clientpositive/subquery_in.q ql/src/test/queries/clientpositive/subquery_multiinsert.q ql/src/test/queries/clientpositive/subquery_notexists.q ql/src/test/queries/clientpositive/subquery_notin.q ql/src/test/results/clientnegative/subquery_exists_implicit_gby.q.out ql/src/test/results/clientnegative/subquery_in_groupby.q.out ql/src/test/results/clientnegative/subquery_in_select.q.out ql/src/test/results/clientnegative/subquery_multiple_cols_in_select.q.out ql/src/test/results/clientnegative/subquery_notexists_implicit_gby.q.out ql/src/test/results/clientnegative/subquery_subquery_chain.q.out ql/src/test/results/clientnegative/subquery_windowing_corr.q.out ql/src/test/results/clientnegative/subquery_with_or_cond.q.out ql/src/test/results/clientpositive/subquery_exists.q.out ql/src/test/results/clientpositive/subquery_in.q.out ql/src/test/results/clientpositive/subquery_multiinsert.q.out ql/src/test/results/clientpositive/subquery_notexists.q.out ql/src/test/results/clientpositive/subquery_notin.q.out To: JIRA, ashutoshc, hbutani
          Hide
          Phabricator added a comment -

          hbutani has commented on the revision "HIVE-784 [jira] Support uncorrelated subqueries in the WHERE clause".

          Tried to address all the issues you raised, except for the Nested SubQuery check. In addition:

          • added a bunch of -ve tests: subquery with or, check for exists/not exists with implicit Group By, subquery with multiple columns in select

          INLINE COMMENTS
          ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g:391 Changed the grammar so we support both these forms:

          select ...
          from ...
          where x not in (select ...)

          or

          select...
          from ...
          where not x in (select ...)
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:470 this is checking the where clause in the SubQuery.
          It can be null. When it is null, it implies this is an uncorrelated SubQuery.
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:496 yes this is allowed. Works in mysql. Can look into this in a later jira
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:537 sqNewSarchCond can be null. In case there is only 1 conjunct, no copy is made.
          ASTNode equals doesn't do a deep check.
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1804 No this feature is missing. Can this be a separate jira
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6526 i looked at this again:

          • the initial switch is different, because this method is based on subQuery.joinType
          • the way the left and right aliases are setup is different.
          • the code for initializing expressions,filters and filterFrPushing can be refactored out. Doesn't seem worth it though.
            ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java:101 this gets caught in the remove method below, because then the SubQuery is not a top level conjunct.
            Have added a -ve test for this.
            ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java:226-233 this method differs from SemanticAnalyzer::doPhase1GetAllAggregations in:
          • doesn't throw SemanticException
          • doesn't pass the ASTNode to the UnparseTranslator.
          • returns an int instead of the Agg/Windowing ASTNodes

          REVISION DETAIL
          https://reviews.facebook.net/D13443

          To: JIRA, ashutoshc, hbutani

          Show
          Phabricator added a comment - hbutani has commented on the revision " HIVE-784 [jira] Support uncorrelated subqueries in the WHERE clause". Tried to address all the issues you raised, except for the Nested SubQuery check. In addition: added a bunch of -ve tests: subquery with or, check for exists/not exists with implicit Group By, subquery with multiple columns in select INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g:391 Changed the grammar so we support both these forms: select ... from ... where x not in (select ...) or select... from ... where not x in (select ...) ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:470 this is checking the where clause in the SubQuery. It can be null. When it is null, it implies this is an uncorrelated SubQuery. ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:496 yes this is allowed. Works in mysql. Can look into this in a later jira ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java:537 sqNewSarchCond can be null. In case there is only 1 conjunct, no copy is made. ASTNode equals doesn't do a deep check. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:1804 No this feature is missing. Can this be a separate jira ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6526 i looked at this again: the initial switch is different, because this method is based on subQuery.joinType the way the left and right aliases are setup is different. the code for initializing expressions,filters and filterFrPushing can be refactored out. Doesn't seem worth it though. ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java:101 this gets caught in the remove method below, because then the SubQuery is not a top level conjunct. Have added a -ve test for this. ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java:226-233 this method differs from SemanticAnalyzer::doPhase1GetAllAggregations in: doesn't throw SemanticException doesn't pass the ASTNode to the UnparseTranslator. returns an int instead of the Agg/Windowing ASTNodes REVISION DETAIL https://reviews.facebook.net/D13443 To: JIRA, ashutoshc, hbutani
          Hide
          Ashutosh Chauhan added a comment -

          +1

          Show
          Ashutosh Chauhan added a comment - +1
          Hide
          Brock Noland added a comment -

          I understand SemanticAnalyzer is hard to unit test, but SubQueryUtils is full of static methods that should be much easier to unit test. Just curious, is there a reason we are not unit testing this code?

          Show
          Brock Noland added a comment - I understand SemanticAnalyzer is hard to unit test, but SubQueryUtils is full of static methods that should be much easier to unit test. Just curious, is there a reason we are not unit testing this code?
          Hide
          Harish Butani added a comment -

          No good point. Will add unittests in a subsequent jira. There are still several things to do for this feature.

          Show
          Harish Butani added a comment - No good point. Will add unittests in a subsequent jira. There are still several things to do for this feature.
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12609074/D13443.2.patch

          SUCCESS: +1 4442 tests passed

          Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1196/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1196/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12609074/D13443.2.patch SUCCESS: +1 4442 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1196/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1196/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated.
          Hide
          Harish Butani added a comment -

          Brock Noland have added HIVE-5615 for tracking unit tests issue

          Show
          Harish Butani added a comment - Brock Noland have added HIVE-5615 for tracking unit tests issue
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Harish!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Harish!
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hive-trunk-hadoop2 #519 (See https://builds.apache.org/job/Hive-trunk-hadoop2/519/)
          HIVE-784 : Support uncorrelated subqueries in the WHERE clause (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1535040)

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_exists_implicit_gby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_groupby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_select.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_multiple_cols_in_select.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_notexists_implicit_gby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_subquery_chain.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_windowing_corr.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_with_or_cond.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_exists.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_in.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_multiinsert.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_notexists.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_notin.q
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_exists_implicit_gby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_in_groupby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_in_select.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_multiple_cols_in_select.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_notexists_implicit_gby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_subquery_chain.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_windowing_corr.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_with_or_cond.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_exists.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_in.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_multiinsert.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_notexists.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_notin.q.out
          Show
          Hudson added a comment - FAILURE: Integrated in Hive-trunk-hadoop2 #519 (See https://builds.apache.org/job/Hive-trunk-hadoop2/519/ ) HIVE-784 : Support uncorrelated subqueries in the WHERE clause (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1535040 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java /hive/trunk/ql/src/test/queries/clientnegative/subquery_exists_implicit_gby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_groupby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_select.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_multiple_cols_in_select.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_notexists_implicit_gby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_subquery_chain.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_windowing_corr.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_with_or_cond.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_exists.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_in.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_multiinsert.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_notexists.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_notin.q /hive/trunk/ql/src/test/results/clientnegative/subquery_exists_implicit_gby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_in_groupby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_in_select.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_multiple_cols_in_select.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_notexists_implicit_gby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_subquery_chain.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_windowing_corr.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_with_or_cond.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_exists.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_in.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_multiinsert.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_notexists.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_notin.q.out
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hive-trunk-h0.21 #2417 (See https://builds.apache.org/job/Hive-trunk-h0.21/2417/)
          HIVE-784 : Support uncorrelated subqueries in the WHERE clause (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1535040)

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_exists_implicit_gby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_groupby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_select.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_multiple_cols_in_select.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_notexists_implicit_gby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_subquery_chain.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_windowing_corr.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_with_or_cond.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_exists.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_in.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_multiinsert.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_notexists.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_notin.q
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_exists_implicit_gby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_in_groupby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_in_select.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_multiple_cols_in_select.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_notexists_implicit_gby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_subquery_chain.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_windowing_corr.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_with_or_cond.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_exists.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_in.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_multiinsert.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_notexists.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_notin.q.out
          Show
          Hudson added a comment - FAILURE: Integrated in Hive-trunk-h0.21 #2417 (See https://builds.apache.org/job/Hive-trunk-h0.21/2417/ ) HIVE-784 : Support uncorrelated subqueries in the WHERE clause (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1535040 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java /hive/trunk/ql/src/test/queries/clientnegative/subquery_exists_implicit_gby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_groupby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_select.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_multiple_cols_in_select.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_notexists_implicit_gby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_subquery_chain.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_windowing_corr.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_with_or_cond.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_exists.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_in.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_multiinsert.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_notexists.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_notin.q /hive/trunk/ql/src/test/results/clientnegative/subquery_exists_implicit_gby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_in_groupby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_in_select.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_multiple_cols_in_select.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_notexists_implicit_gby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_subquery_chain.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_windowing_corr.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_with_or_cond.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_exists.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_in.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_multiinsert.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_notexists.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_notin.q.out
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hive-trunk-hadoop2-ptest #152 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/152/)
          HIVE-784 : Support uncorrelated subqueries in the WHERE clause (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1535040)

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_exists_implicit_gby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_groupby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_select.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_multiple_cols_in_select.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_notexists_implicit_gby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_subquery_chain.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_windowing_corr.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_with_or_cond.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_exists.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_in.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_multiinsert.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_notexists.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_notin.q
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_exists_implicit_gby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_in_groupby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_in_select.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_multiple_cols_in_select.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_notexists_implicit_gby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_subquery_chain.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_windowing_corr.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_with_or_cond.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_exists.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_in.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_multiinsert.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_notexists.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_notin.q.out
          Show
          Hudson added a comment - FAILURE: Integrated in Hive-trunk-hadoop2-ptest #152 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/152/ ) HIVE-784 : Support uncorrelated subqueries in the WHERE clause (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1535040 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java /hive/trunk/ql/src/test/queries/clientnegative/subquery_exists_implicit_gby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_groupby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_select.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_multiple_cols_in_select.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_notexists_implicit_gby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_subquery_chain.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_windowing_corr.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_with_or_cond.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_exists.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_in.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_multiinsert.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_notexists.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_notin.q /hive/trunk/ql/src/test/results/clientnegative/subquery_exists_implicit_gby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_in_groupby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_in_select.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_multiple_cols_in_select.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_notexists_implicit_gby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_subquery_chain.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_windowing_corr.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_with_or_cond.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_exists.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_in.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_multiinsert.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_notexists.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_notin.q.out
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #215 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/215/)
          HIVE-784 : Support uncorrelated subqueries in the WHERE clause (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1535040)

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_exists_implicit_gby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_groupby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_select.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_multiple_cols_in_select.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_notexists_implicit_gby.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_subquery_chain.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_windowing_corr.q
          • /hive/trunk/ql/src/test/queries/clientnegative/subquery_with_or_cond.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_exists.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_in.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_multiinsert.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_notexists.q
          • /hive/trunk/ql/src/test/queries/clientpositive/subquery_notin.q
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_exists_implicit_gby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_in_groupby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_in_select.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_multiple_cols_in_select.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_notexists_implicit_gby.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_subquery_chain.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_windowing_corr.q.out
          • /hive/trunk/ql/src/test/results/clientnegative/subquery_with_or_cond.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_exists.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_in.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_multiinsert.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_notexists.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/subquery_notin.q.out
          Show
          Hudson added a comment - SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #215 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/215/ ) HIVE-784 : Support uncorrelated subqueries in the WHERE clause (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1535040 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java /hive/trunk/ql/src/test/queries/clientnegative/subquery_exists_implicit_gby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_groupby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_in_select.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_multiple_cols_in_select.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_notexists_implicit_gby.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_subquery_chain.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_windowing_corr.q /hive/trunk/ql/src/test/queries/clientnegative/subquery_with_or_cond.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_exists.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_in.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_multiinsert.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_notexists.q /hive/trunk/ql/src/test/queries/clientpositive/subquery_notin.q /hive/trunk/ql/src/test/results/clientnegative/subquery_exists_implicit_gby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_in_groupby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_in_select.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_multiple_cols_in_select.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_notexists_implicit_gby.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_subquery_chain.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_windowing_corr.q.out /hive/trunk/ql/src/test/results/clientnegative/subquery_with_or_cond.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_exists.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_in.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_multiinsert.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_notexists.q.out /hive/trunk/ql/src/test/results/clientpositive/subquery_notin.q.out
          Hide
          Sivaramakrishnan Narayanan added a comment -

          Haven't looked at the code yet - I'm commenting based on the design doc.

          Comment on section 2.2.1 in the design doc.

          One thing to be wary of the NOT-IN to LOJ with null check transformation is that it is only valid if the subquery's output is known to be non-nullable.

          Example, consider two tables:

          T1.x
          1
          2
          null
          T2.y
          1
          null

          Now consider this uncorrelated NOT-IN subquery:

          select * from T1 where T1.x not in (select y from T2)
          

          Should produce (if I remember my SQL semantics correctly), an empty result i.e. 0 rows.

          This is because "not in" is equivalent to <> ALL. If the inner subquery produces a null, then its comparison with anything always produces null. Therefore, <> ALL check fails for every x from T1. Therefore, the result is empty.

          According to section 2.2.1, the transformation to LOJ with null check will look like this:

          select * from T1 Left Outer Join (select y from T2) sq1 on (T1.x=sq1.y) where sq1.y is null
          

          The LOJ will produce rows like this:

          T1.x T2.y
          1 1
          2 null
          null null

          Applying the null filter after the LOJ produces two rows. This is an incorrect result. If this case is already covered in your implementation, please ignore the comment.

          Show
          Sivaramakrishnan Narayanan added a comment - Haven't looked at the code yet - I'm commenting based on the design doc. Comment on section 2.2.1 in the design doc. One thing to be wary of the NOT-IN to LOJ with null check transformation is that it is only valid if the subquery's output is known to be non-nullable. Example, consider two tables: T1.x 1 2 null T2.y 1 null Now consider this uncorrelated NOT-IN subquery: select * from T1 where T1.x not in (select y from T2) Should produce (if I remember my SQL semantics correctly), an empty result i.e. 0 rows. This is because "not in" is equivalent to <> ALL. If the inner subquery produces a null, then its comparison with anything always produces null. Therefore, <> ALL check fails for every x from T1. Therefore, the result is empty. According to section 2.2.1, the transformation to LOJ with null check will look like this: select * from T1 Left Outer Join (select y from T2) sq1 on (T1.x=sq1.y) where sq1.y is null The LOJ will produce rows like this: T1.x T2.y 1 1 2 null null null Applying the null filter after the LOJ produces two rows. This is an incorrect result. If this case is already covered in your implementation, please ignore the comment.
          Hide
          Sivaramakrishnan Narayanan added a comment -

          Another comment on the design doc:

          Section 2.4

          "Not Exists has the same semantics as a correlated Not In Operator without the SubQuery Expression"

          This is not strictly true, again because of nulls in subquery. Let's take the same tables as in previous comment. Now let's consider the not-exists query.

          select * from T1 where not exists (select * from T2 where T2.y=T1.x)
          

          The result for this query should be:

          T1.x
          2
          null

          The LOJ with null filter transformation works for "Not Exists", but not for "Not In". Again, if your implementation already takes care of this, please ignore.

          My head hurts

          Show
          Sivaramakrishnan Narayanan added a comment - Another comment on the design doc: Section 2.4 "Not Exists has the same semantics as a correlated Not In Operator without the SubQuery Expression" This is not strictly true, again because of nulls in subquery. Let's take the same tables as in previous comment. Now let's consider the not-exists query. select * from T1 where not exists (select * from T2 where T2.y=T1.x) The result for this query should be: T1.x 2 null The LOJ with null filter transformation works for "Not Exists", but not for "Not In". Again, if your implementation already takes care of this, please ignore. My head hurts
          Hide
          Harish Butani added a comment -

          Sivaramakrishnan Narayanan thanks for reviewing the doc. This is very helpful. Will add your eg to the Spec.
          You are right, the transformation doesn't work in case of nulls for 'Not In'.
          Not sure how to solve this easily. Any thoughts?
          One thought is to do a Cross product and then a Group By on the Outer Query's columns, applying a new UDAF 'ALL' that takes a predicate and returns true if all rows are true on the predicate. This was in the context of supporting queries of the form ' > ALL' or '< ANY'.
          So your e.g. would be transformed to something like this:

          select t1.*, ALL(t1.x <> t2.y)
          from t1 cross join t2
          group by t1.x
          having ALL(t1.x <> t2.y)
          

          But this transformation can get quite involved if the Outer Query has a Group By and/or windowing. So wasn't planning to do this in the first pass.
          Also a cross join followed by GBy is expensive. It may make sense to support these use cases with a SubQuery Operator.

          Show
          Harish Butani added a comment - Sivaramakrishnan Narayanan thanks for reviewing the doc. This is very helpful. Will add your eg to the Spec. You are right, the transformation doesn't work in case of nulls for 'Not In'. Not sure how to solve this easily. Any thoughts? One thought is to do a Cross product and then a Group By on the Outer Query's columns, applying a new UDAF 'ALL' that takes a predicate and returns true if all rows are true on the predicate. This was in the context of supporting queries of the form ' > ALL' or '< ANY'. So your e.g. would be transformed to something like this: select t1.*, ALL(t1.x <> t2.y) from t1 cross join t2 group by t1.x having ALL(t1.x <> t2.y) But this transformation can get quite involved if the Outer Query has a Group By and/or windowing. So wasn't planning to do this in the first pass. Also a cross join followed by GBy is expensive. It may make sense to support these use cases with a SubQuery Operator.
          Hide
          Harish Butani added a comment -

          Sivaramakrishnan Narayanan have updated the Spec with your example and proposed a transformation.
          The proposal is to join in a extra subquery that counts the number of nulls.

          Show
          Harish Butani added a comment - Sivaramakrishnan Narayanan have updated the Spec with your example and proposed a transformation. The proposal is to join in a extra subquery that counts the number of nulls.

            People

            • Assignee:
              Harish Butani
              Reporter:
              Ning Zhang
            • Votes:
              5 Vote for this issue
              Watchers:
              27 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development