Details

    • Type: Sub-task
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.0, 3.2.0
    • Labels:
    • old issue number:
      36

      Description

      A semi-join between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned. Semi-joins are written using the EXISTS or IN constructs.

      An anti-join is the opposite of a semi-join and is written using the NOT EXISTS or NOT IN constructs.

      There's a pretty good write-up [here] (http://www.dbspecialists.com/files/presentations/semijoins.html) on semi/anti joins.

      1. 167.patch
        188 kB
        Maryann Xue
      2. 167-2.patch
        189 kB
        Maryann Xue

        Activity

        Hide
        enis Enis Soztutar added a comment -

        Bulk close of all issues that has been resolved in a released version.

        Show
        enis Enis Soztutar added a comment - Bulk close of all issues that has been resolved in a released version.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Phoenix | 3.0 | Hadoop1 #242 (See https://builds.apache.org/job/Phoenix-3.0-hadoop1/242/)
        PHOENIX-167 Support semi/anti-joins (maryannxue: rev 5effbbca0fd904168edc9d5a35aaf847cfb68dc6)

        • phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeVisitor.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/JoinTableNode.java
        • phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/TraverseAllParseNodeVisitor.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java
        • phoenix-core/src/main/java/org/apache/phoenix/exception/SQLExceptionCode.java
        • phoenix-core/src/main/java/org/apache/phoenix/coprocessor/HashJoinRegionScanner.java
        • phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/UnsupportedAllParseNodeVisitor.java
        • phoenix-core/src/main/antlr3/PhoenixSQL.g
        • phoenix-core/src/main/java/org/apache/phoenix/compile/SubqueryRewriter.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/UpsertCompiler.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeRewriter.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/BooleanParseNodeVisitor.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/StatementNormalizer.java
        • phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/TraverseNoParseNodeVisitor.java
        • phoenix-core/src/it/java/org/apache/phoenix/end2end/SubqueryIT.java
        • phoenix-core/src/test/java/org/apache/phoenix/compile/JoinQueryCompilerTest.java
        • phoenix-core/src/main/java/org/apache/phoenix/execute/HashJoinPlan.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/JoinCompiler.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/StatelessTraverseAllParseNodeVisitor.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/ExpressionCompiler.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/ExistsParseNode.java
        • phoenix-core/src/it/java/org/apache/phoenix/end2end/HashJoinIT.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Phoenix | 3.0 | Hadoop1 #242 (See https://builds.apache.org/job/Phoenix-3.0-hadoop1/242/ ) PHOENIX-167 Support semi/anti-joins (maryannxue: rev 5effbbca0fd904168edc9d5a35aaf847cfb68dc6) phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeVisitor.java phoenix-core/src/main/java/org/apache/phoenix/parse/JoinTableNode.java phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java phoenix-core/src/main/java/org/apache/phoenix/parse/TraverseAllParseNodeVisitor.java phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java phoenix-core/src/main/java/org/apache/phoenix/exception/SQLExceptionCode.java phoenix-core/src/main/java/org/apache/phoenix/coprocessor/HashJoinRegionScanner.java phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java phoenix-core/src/main/java/org/apache/phoenix/parse/UnsupportedAllParseNodeVisitor.java phoenix-core/src/main/antlr3/PhoenixSQL.g phoenix-core/src/main/java/org/apache/phoenix/compile/SubqueryRewriter.java phoenix-core/src/main/java/org/apache/phoenix/compile/UpsertCompiler.java phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeRewriter.java phoenix-core/src/main/java/org/apache/phoenix/parse/BooleanParseNodeVisitor.java phoenix-core/src/main/java/org/apache/phoenix/compile/StatementNormalizer.java phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java phoenix-core/src/main/java/org/apache/phoenix/parse/TraverseNoParseNodeVisitor.java phoenix-core/src/it/java/org/apache/phoenix/end2end/SubqueryIT.java phoenix-core/src/test/java/org/apache/phoenix/compile/JoinQueryCompilerTest.java phoenix-core/src/main/java/org/apache/phoenix/execute/HashJoinPlan.java phoenix-core/src/main/java/org/apache/phoenix/compile/JoinCompiler.java phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.java phoenix-core/src/main/java/org/apache/phoenix/parse/StatelessTraverseAllParseNodeVisitor.java phoenix-core/src/main/java/org/apache/phoenix/compile/ExpressionCompiler.java phoenix-core/src/main/java/org/apache/phoenix/parse/ExistsParseNode.java phoenix-core/src/it/java/org/apache/phoenix/end2end/HashJoinIT.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Phoenix | Master #402 (See https://builds.apache.org/job/Phoenix-master/402/)
        PHOENIX-167 Support semi/anti-joins (maryannxue: rev 909d975960f4592e75fb3fdb6c2d0cecd2d51f1d)

        • phoenix-core/src/main/java/org/apache/phoenix/execute/HashJoinPlan.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/UnsupportedAllParseNodeVisitor.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/ExpressionCompiler.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/UpsertCompiler.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/SubqueryRewriter.java
        • phoenix-core/src/test/java/org/apache/phoenix/compile/JoinQueryCompilerTest.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeVisitor.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/BooleanParseNodeVisitor.java
        • phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java
        • phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/StatementNormalizer.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.java
        • phoenix-core/src/main/java/org/apache/phoenix/exception/SQLExceptionCode.java
        • phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeRewriter.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java
        • phoenix-core/src/main/antlr3/PhoenixSQL.g
        • phoenix-core/src/main/java/org/apache/phoenix/parse/ExistsParseNode.java
        • phoenix-core/src/it/java/org/apache/phoenix/end2end/SubqueryIT.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/JoinTableNode.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/TraverseAllParseNodeVisitor.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/StatelessTraverseAllParseNodeVisitor.java
        • phoenix-core/src/main/java/org/apache/phoenix/coprocessor/HashJoinRegionScanner.java
        • phoenix-core/src/it/java/org/apache/phoenix/end2end/HashJoinIT.java
        • phoenix-core/src/main/java/org/apache/phoenix/compile/JoinCompiler.java
        • phoenix-core/src/main/java/org/apache/phoenix/parse/TraverseNoParseNodeVisitor.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Phoenix | Master #402 (See https://builds.apache.org/job/Phoenix-master/402/ ) PHOENIX-167 Support semi/anti-joins (maryannxue: rev 909d975960f4592e75fb3fdb6c2d0cecd2d51f1d) phoenix-core/src/main/java/org/apache/phoenix/execute/HashJoinPlan.java phoenix-core/src/main/java/org/apache/phoenix/parse/UnsupportedAllParseNodeVisitor.java phoenix-core/src/main/java/org/apache/phoenix/compile/ExpressionCompiler.java phoenix-core/src/main/java/org/apache/phoenix/compile/UpsertCompiler.java phoenix-core/src/main/java/org/apache/phoenix/compile/SubqueryRewriter.java phoenix-core/src/test/java/org/apache/phoenix/compile/JoinQueryCompilerTest.java phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeVisitor.java phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java phoenix-core/src/main/java/org/apache/phoenix/parse/BooleanParseNodeVisitor.java phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java phoenix-core/src/main/java/org/apache/phoenix/compile/StatementNormalizer.java phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.java phoenix-core/src/main/java/org/apache/phoenix/exception/SQLExceptionCode.java phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeRewriter.java phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java phoenix-core/src/main/antlr3/PhoenixSQL.g phoenix-core/src/main/java/org/apache/phoenix/parse/ExistsParseNode.java phoenix-core/src/it/java/org/apache/phoenix/end2end/SubqueryIT.java phoenix-core/src/main/java/org/apache/phoenix/parse/JoinTableNode.java phoenix-core/src/main/java/org/apache/phoenix/parse/TraverseAllParseNodeVisitor.java phoenix-core/src/main/java/org/apache/phoenix/parse/StatelessTraverseAllParseNodeVisitor.java phoenix-core/src/main/java/org/apache/phoenix/coprocessor/HashJoinRegionScanner.java phoenix-core/src/it/java/org/apache/phoenix/end2end/HashJoinIT.java phoenix-core/src/main/java/org/apache/phoenix/compile/JoinCompiler.java phoenix-core/src/main/java/org/apache/phoenix/parse/TraverseNoParseNodeVisitor.java
        Hide
        maryannxue Maryann Xue added a comment -

        Opened a JIRA for documentation work (PHOENIX-1326)

        James Taylor I think Apache Derby is a good idea.

        Show
        maryannxue Maryann Xue added a comment - Opened a JIRA for documentation work ( PHOENIX-1326 ) James Taylor I think Apache Derby is a good idea.
        Hide
        jamestaylor James Taylor added a comment -

        +1. Nice work, Maryann Xue.

        Show
        jamestaylor James Taylor added a comment - +1. Nice work, Maryann Xue .
        Hide
        maryannxue Maryann Xue added a comment -

        Added LIMIT 1 to non-correlated EXISTS subquery, and added a corresponding test case.

        Show
        maryannxue Maryann Xue added a comment - Added LIMIT 1 to non-correlated EXISTS subquery, and added a corresponding test case.
        Hide
        maryannxue Maryann Xue added a comment -

        James Taylor For IN, the inner query will be a distinct (select distinct join-keys). For EXISTS, most of the time their inner query is correlated and they are equivalent to IN sub-queries or complicated correlated sub-queries (which will be covered by PHOENIX-945). Otherwise, if the inner query has no reference to the outer query, yes, we should add a LIMIT here. Good reminder, thanks, James!

        Anoop Sam John Yes, think I can work on the documentation after I finish the entire work of PHOENIX-1167. Thanks a lot for reminding me!

        Show
        maryannxue Maryann Xue added a comment - James Taylor For IN, the inner query will be a distinct (select distinct join-keys). For EXISTS, most of the time their inner query is correlated and they are equivalent to IN sub-queries or complicated correlated sub-queries (which will be covered by PHOENIX-945 ). Otherwise, if the inner query has no reference to the outer query, yes, we should add a LIMIT here. Good reminder, thanks, James! Anoop Sam John Yes, think I can work on the documentation after I finish the entire work of PHOENIX-1167 . Thanks a lot for reminding me!
        Hide
        anoop.hbase Anoop Sam John added a comment -

        Great work Maryann Xue. Are we documenting all these features in our doc?

        Show
        anoop.hbase Anoop Sam John added a comment - Great work Maryann Xue . Are we documenting all these features in our doc?
        Hide
        jamestaylor James Taylor added a comment -

        +1. Wow, this is fantastic, Maryann Xue. Great work.

        For EXISTS, do you push a limit into the joined query? For IN, how are you handling duplicate rows?

        I think we'll need to invest in our test framework soon, as we'll want better test coverage for all the various combinations. I wonder if we can borrow a subset of test suite from Apache Derby? Or maybe there are other options?

        Show
        jamestaylor James Taylor added a comment - +1. Wow, this is fantastic, Maryann Xue . Great work. For EXISTS, do you push a limit into the joined query? For IN, how are you handling duplicate rows? I think we'll need to invest in our test framework soon, as we'll want better test coverage for all the various combinations. I wonder if we can borrow a subset of test suite from Apache Derby? Or maybe there are other options?
        Hide
        maryannxue Maryann Xue added a comment -

        This patch would also resolve PHOENIX-1300, since after sub-queries are converted to joins, the join algorithm can choose whichever approach it would take.

        Show
        maryannxue Maryann Xue added a comment - This patch would also resolve PHOENIX-1300 , since after sub-queries are converted to joins, the join algorithm can choose whichever approach it would take.
        Hide
        maryannxue Maryann Xue added a comment -

        1. Change IN/NOT IN/EXISTS/NOT EXISTS into semi/anti joins or left joins:
        If the where-clause sub-query is one of those top-node conditions (being the only condition node or direct descendant of AND nodes), we convert the sub-query directly into semi-joins, anti-joins or inner-joins, and meanwhile remove the original condition node from the where clause.
        Otherwise, we convert the sub-query into left-joins and change the original condition node into a null test of a join table field (ONE if matched, NULL if not matched).

        2. Optimize semi-joins:
        In circumstances where child-parent join optimization is enough to cover the semi-join semantics, we won't do joins at all.

        Show
        maryannxue Maryann Xue added a comment - 1. Change IN/NOT IN/EXISTS/NOT EXISTS into semi/anti joins or left joins: If the where-clause sub-query is one of those top-node conditions (being the only condition node or direct descendant of AND nodes), we convert the sub-query directly into semi-joins, anti-joins or inner-joins, and meanwhile remove the original condition node from the where clause. Otherwise, we convert the sub-query into left-joins and change the original condition node into a null test of a join table field (ONE if matched, NULL if not matched). 2. Optimize semi-joins: In circumstances where child-parent join optimization is enough to cover the semi-join semantics, we won't do joins at all.
        Hide
        pctony Tony Stevenson added a comment -

        Comment:maryannxue:05/31/13 11:06:04 PM:

        assigned

        Show
        pctony Tony Stevenson added a comment - Comment:maryannxue:05/31/13 11:06:04 PM: assigned

          People

          • Assignee:
            maryannxue Maryann Xue
            Reporter:
            jamestaylor James Taylor
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development