Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-1277

GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.0, 0.11.0
    • None
    • None

    Description

      It looks like GreedyHeuristicJoinOrderAlgorithm always assumes every joins are associative.

      Following query returns in inaccurate result:

      select * FROM
      customer c 
      right outer join nation n on c.c_custkey = n.n_nationkey
      join region r on c.c_custkey = r.r_regionkey;
      

      because GreedyHeuristicJoinOrderAlgorithm changes join order as

      select * FROM
      customer c 
      join region r on c.c_custkey = r.r_regionkey
      right outer join nation n on c.c_custkey = n.n_nationkey;
      

      I think getBestPair() should be fixed to avoid wrong join ordering.

      Attachments

        Issue Links

          Activity

            ykrips Jihun Kang added a comment -

            How about rewriting logical plan for this case? I feel that rewriting logical plan like this seems to be more optimal, and lower cost.

            JOIN(8)(INNER)
              => Join Cond: default.c.c_custkey (INT8) = default.n.n_nationkey (INT8)
               SCAN(1) on default.nation as n
               JOIN(7)(INNER)
                 => Join Cond: default.c.c_custkey (INT8) = default.r.r_regionkey (INT8)
                  SCAN(3) on default.region as r
                  SCAN(0) on default.customer as c
            
            ykrips Jihun Kang added a comment - How about rewriting logical plan for this case? I feel that rewriting logical plan like this seems to be more optimal, and lower cost. JOIN(8)(INNER) => Join Cond: default .c.c_custkey (INT8) = default .n.n_nationkey (INT8) SCAN(1) on default .nation as n JOIN(7)(INNER) => Join Cond: default .c.c_custkey (INT8) = default .r.r_regionkey (INT8) SCAN(3) on default .region as r SCAN(0) on default .customer as c
            jihoonson Jihoon Son added a comment -

            For the above case, the customer and nation tables must be joined first, and then the result is joined with the region table.
            In addition, the join type should be the outer join for the first join.

            I think that we should handle joins separately according to their join types when optimizing the join order.

            jihoonson Jihoon Son added a comment - For the above case, the customer and nation tables must be joined first, and then the result is joined with the region table. In addition, the join type should be the outer join for the first join. I think that we should handle joins separately according to their join types when optimizing the join order.
            ykrips Jihun Kang added a comment -

            Thanks for explanation. I have few knowledge of tajo core, and just thought about in a cost aspect.

            ykrips Jihun Kang added a comment - Thanks for explanation. I have few knowledge of tajo core, and just thought about in a cost aspect.
            githubbot ASF GitHub Bot added a comment -

            GitHub user sirpkt opened a pull request:

            https://github.com/apache/tajo/pull/379

            TAJO-1277: GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins

            Basically, it limits the range of join ordering until it meets outer join operations.
            For example, in the case of (((((a inner join b) inner join c) outer join d) inner join e) inner join f),
            join ordering is partitioned as three parts as
            1) (a inner join b) inner join c
            2) (result of 1) outer join d
            3) (((result of 2) inner join e) inner join f)

            Following modifications are included:

            • findBestOrder() is changed to partition join ordering
            • getBestPair() and findJoin() are changed to return the corresponding JoinEdges of the selected join because those JoinEdges should be removed before next join ordering

            It passes 'mvn clean install' and several join cases I tested,
            however, I'm not sure this is good approach.
            Please, leave me comments about the patch.

            You can merge this pull request into a Git repository by running:

            $ git pull https://github.com/sirpkt/tajo TAJO-1277

            Alternatively you can review and apply these changes as the patch at:

            https://github.com/apache/tajo/pull/379.patch

            To close this pull request, make a commit to your master/trunk branch
            with (at least) the following in the commit message:

            This closes #379


            commit 74802e228dd3af8e885cc724c32d5746939c2c23
            Author: Keuntae Park <sirpkt@apache.org>
            Date: 2015-02-09T09:21:04Z

            join optimizer is enhanced to distinguish non-associative join cases


            githubbot ASF GitHub Bot added a comment - GitHub user sirpkt opened a pull request: https://github.com/apache/tajo/pull/379 TAJO-1277 : GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins Basically, it limits the range of join ordering until it meets outer join operations. For example, in the case of (((((a inner join b) inner join c) outer join d) inner join e) inner join f), join ordering is partitioned as three parts as 1) (a inner join b) inner join c 2) (result of 1) outer join d 3) (((result of 2) inner join e) inner join f) Following modifications are included: findBestOrder() is changed to partition join ordering getBestPair() and findJoin() are changed to return the corresponding JoinEdges of the selected join because those JoinEdges should be removed before next join ordering It passes 'mvn clean install' and several join cases I tested, however, I'm not sure this is good approach. Please, leave me comments about the patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sirpkt/tajo TAJO-1277 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/379.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #379 commit 74802e228dd3af8e885cc724c32d5746939c2c23 Author: Keuntae Park <sirpkt@apache.org> Date: 2015-02-09T09:21:04Z join optimizer is enhanced to distinguish non-associative join cases
            jihoonson Jihoon Son added a comment -

            No problem.

            jihoonson Jihoon Son added a comment - No problem.
            githubbot ASF GitHub Bot added a comment -

            Github user jihoonson commented on a diff in the pull request:

            https://github.com/apache/tajo/pull/379#discussion_r24393011

            — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
            @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
            JoinEdge bestPair;

            while (remainRelations.size() > 1) {
            + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
            +
            + for (LogicalNode relation : remainRelations) {
            + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
            + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
            + String relationCollection = TUtil.collectionToString(relationStrings, ",");
            + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + for (String relationString: relationStrings) {
            + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + }
            +
            + // check if the relation is the last piece of outer join
            + boolean endInnerRelation = false;
            + for (JoinEdge joinEdge: joinEdges) {
            + switch(joinEdge.getJoinType()) {
            + case LEFT_OUTER:
            + case RIGHT_OUTER:
            + case FULL_OUTER:
            — End diff –

            We should consider other join types such as ```anti join``` or ```semi join``` as well as ```inner join``` and ```outer join```.
            So, I think that the iteration must be stopped when finding ```anti join``` or ```semi join```.

            githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24393011 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { + switch(joinEdge.getJoinType()) { + case LEFT_OUTER: + case RIGHT_OUTER: + case FULL_OUTER: — End diff – We should consider other join types such as ```anti join``` or ```semi join``` as well as ```inner join``` and ```outer join```. So, I think that the iteration must be stopped when finding ```anti join``` or ```semi join```.
            githubbot ASF GitHub Bot added a comment -

            Github user jihoonson commented on a diff in the pull request:

            https://github.com/apache/tajo/pull/379#discussion_r24393231

            — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
            @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
            JoinEdge bestPair;

            while (remainRelations.size() > 1) {
            + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
            +
            + for (LogicalNode relation : remainRelations) {
            + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
            + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
            + String relationCollection = TUtil.collectionToString(relationStrings, ",");
            + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + for (String relationString: relationStrings) {
            + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + }
            +
            + // check if the relation is the last piece of outer join
            + boolean endInnerRelation = false;
            + for (JoinEdge joinEdge: joinEdges) {
            — End diff –

            Here, joinEdges seem to be sorted in the order of their occurrence in the user query.
            So, I wonder how we can guarantee that joinEdges are always sorted in that order.

            githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24393231 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { — End diff – Here, joinEdges seem to be sorted in the order of their occurrence in the user query. So, I wonder how we can guarantee that joinEdges are always sorted in that order.
            githubbot ASF GitHub Bot added a comment -

            Github user jihoonson commented on the pull request:

            https://github.com/apache/tajo/pull/379#issuecomment-73655137

            @sirpkt, thanks for your nice work.
            The basic approach looks good to me.
            I left some comments.

            Sometimes, outer joins can be associative under some conditions. (Please refer to http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)
            So, we should handle them, but I think that it should be done in another issue.

            githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-73655137 @sirpkt, thanks for your nice work. The basic approach looks good to me. I left some comments. Sometimes, outer joins can be associative under some conditions. (Please refer to http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative , https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins ) So, we should handle them, but I think that it should be done in another issue.
            githubbot ASF GitHub Bot added a comment -

            Github user sirpkt commented on a diff in the pull request:

            https://github.com/apache/tajo/pull/379#discussion_r24468860

            — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
            @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
            JoinEdge bestPair;

            while (remainRelations.size() > 1) {
            + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
            +
            + for (LogicalNode relation : remainRelations) {
            + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
            + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
            + String relationCollection = TUtil.collectionToString(relationStrings, ",");
            + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + for (String relationString: relationStrings) {
            + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + }
            +
            + // check if the relation is the last piece of outer join
            + boolean endInnerRelation = false;
            + for (JoinEdge joinEdge: joinEdges) {
            + switch(joinEdge.getJoinType()) {
            + case LEFT_OUTER:
            + case RIGHT_OUTER:
            + case FULL_OUTER:
            — End diff –

            Thank you for the comment, @jihoonson .
            I'll add other join types.

            githubbot ASF GitHub Bot added a comment - Github user sirpkt commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24468860 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { + switch(joinEdge.getJoinType()) { + case LEFT_OUTER: + case RIGHT_OUTER: + case FULL_OUTER: — End diff – Thank you for the comment, @jihoonson . I'll add other join types.
            githubbot ASF GitHub Bot added a comment -

            Github user sirpkt commented on a diff in the pull request:

            https://github.com/apache/tajo/pull/379#discussion_r24469183

            — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
            @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
            JoinEdge bestPair;

            while (remainRelations.size() > 1) {
            + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
            +
            + for (LogicalNode relation : remainRelations) {
            + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
            + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
            + String relationCollection = TUtil.collectionToString(relationStrings, ",");
            + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + for (String relationString: relationStrings) {
            + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + }
            +
            + // check if the relation is the last piece of outer join
            + boolean endInnerRelation = false;
            + for (JoinEdge joinEdge: joinEdges) {
            — End diff –

            Actually, I thought that the order of JoinEdges in joinEdges list does not matter.
            Since the given relation is the right relation for all the JoinEdges,
            I just check the existence of JoinEdge that is outer join.
            And, in my thought, the result does not vary regardless of the order of JoinEdge's occurrence.

            githubbot ASF GitHub Bot added a comment - Github user sirpkt commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24469183 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { — End diff – Actually, I thought that the order of JoinEdges in joinEdges list does not matter. Since the given relation is the right relation for all the JoinEdges, I just check the existence of JoinEdge that is outer join. And, in my thought, the result does not vary regardless of the order of JoinEdge's occurrence.
            githubbot ASF GitHub Bot added a comment -

            Github user jihoonson commented on a diff in the pull request:

            https://github.com/apache/tajo/pull/379#discussion_r24472011

            — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
            @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
            JoinEdge bestPair;

            while (remainRelations.size() > 1) {
            + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
            +
            + for (LogicalNode relation : remainRelations) {
            + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
            + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
            + String relationCollection = TUtil.collectionToString(relationStrings, ",");
            + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + for (String relationString: relationStrings) {
            + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + }
            +
            + // check if the relation is the last piece of outer join
            + boolean endInnerRelation = false;
            + for (JoinEdge joinEdge: joinEdges) {
            — End diff –

            You are right. It was my misunderstanding.

            githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24472011 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { — End diff – You are right. It was my misunderstanding.
            githubbot ASF GitHub Bot added a comment -

            Github user jihoonson commented on a diff in the pull request:

            https://github.com/apache/tajo/pull/379#discussion_r24472049

            — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
            @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
            JoinEdge bestPair;

            while (remainRelations.size() > 1) {
            + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
            +
            + for (LogicalNode relation : remainRelations) {
            + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
            + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
            + String relationCollection = TUtil.collectionToString(relationStrings, ",");
            + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + for (String relationString: relationStrings) {
            — End diff –

            Would you mind explaning these lines?
            I found that the same join edge is added twice to the ```joinEdgesForGiven``` list.

            githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24472049 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { — End diff – Would you mind explaning these lines? I found that the same join edge is added twice to the ```joinEdgesForGiven``` list.
            githubbot ASF GitHub Bot added a comment -

            Github user sirpkt commented on a diff in the pull request:

            https://github.com/apache/tajo/pull/379#discussion_r24553012

            — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
            @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
            JoinEdge bestPair;

            while (remainRelations.size() > 1) {
            + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
            +
            + for (LogicalNode relation : remainRelations) {
            + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
            + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
            + String relationCollection = TUtil.collectionToString(relationStrings, ",");
            + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + for (String relationString: relationStrings) {
            — End diff –

            Oh, it's my mistake.
            When relationStrings has only one entry, this code may adds that entry twice.

            When a LogicalNode contains two relations, for example, A and B,
            above code first finds joinEdges whose right relation is "A, B", which is obtained by TUtil.collectionToString().
            Next, it finds joinEdges whose right relation is "A" or "B", which is obtained by 'for (String relationString: relationStrings)'.
            So, if a LogicalNode contains just one relation, this code may adds that relation repeatedly.

            Duplicated relation does not affect the result but I'll patch not to have duplicated relations.

            githubbot ASF GitHub Bot added a comment - Github user sirpkt commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24553012 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { — End diff – Oh, it's my mistake. When relationStrings has only one entry, this code may adds that entry twice. When a LogicalNode contains two relations, for example, A and B, above code first finds joinEdges whose right relation is "A, B", which is obtained by TUtil.collectionToString(). Next, it finds joinEdges whose right relation is "A" or "B", which is obtained by 'for (String relationString: relationStrings)'. So, if a LogicalNode contains just one relation, this code may adds that relation repeatedly. Duplicated relation does not affect the result but I'll patch not to have duplicated relations.
            githubbot ASF GitHub Bot added a comment -

            Github user jihoonson commented on a diff in the pull request:

            https://github.com/apache/tajo/pull/379#discussion_r24555499

            — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
            @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
            JoinEdge bestPair;

            while (remainRelations.size() > 1) {
            + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
            +
            + for (LogicalNode relation : remainRelations) {
            + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
            + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
            + String relationCollection = TUtil.collectionToString(relationStrings, ",");
            + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
            + if (joinEdgesForGiven != null)

            { + joinEdges.addAll(joinEdgesForGiven); + }

            + for (String relationString: relationStrings) {
            — End diff –

            Thanks for your explanation.

            githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24555499 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { — End diff – Thanks for your explanation.
            githubbot ASF GitHub Bot added a comment -

            Github user sirpkt commented on the pull request:

            https://github.com/apache/tajo/pull/379#issuecomment-74016250

            I updated the patch to reflect the comments from @jihoonson .

            • add semi and anti join cases
            • avoid duplicated JoinEdges
            githubbot ASF GitHub Bot added a comment - Github user sirpkt commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74016250 I updated the patch to reflect the comments from @jihoonson . add semi and anti join cases avoid duplicated JoinEdges
            githubbot ASF GitHub Bot added a comment -

            Github user jihoonson commented on the pull request:

            https://github.com/apache/tajo/pull/379#issuecomment-74204669

            +1
            The latest patch looks good to me.
            Thanks for your work!

            githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74204669 +1 The latest patch looks good to me. Thanks for your work!
            hyunsik Hyunsik Choi added a comment -

            Could you wait for a while? I just started reviewing it.

            hyunsik Hyunsik Choi added a comment - Could you wait for a while? I just started reviewing it.
            githubbot ASF GitHub Bot added a comment -

            Github user hyunsik commented on the pull request:

            https://github.com/apache/tajo/pull/379#issuecomment-74479771

            +1
            The patch contains a workaround code which can fix the bug right now.

            I have some comments. Your workaround code seems to try to firstly find the order of joins whose join types is same. In my opinion, this approach may miss some of join order combinations. It may cause sub-optimal join orders in some cases. I think that the right way is to improve the findBestPair() to find only associative join orders. It would be great if we do it in another jira.

            githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74479771 +1 The patch contains a workaround code which can fix the bug right now. I have some comments. Your workaround code seems to try to firstly find the order of joins whose join types is same. In my opinion, this approach may miss some of join order combinations. It may cause sub-optimal join orders in some cases. I think that the right way is to improve the findBestPair() to find only associative join orders. It would be great if we do it in another jira.
            githubbot ASF GitHub Bot added a comment -

            Github user jihoonson commented on the pull request:

            https://github.com/apache/tajo/pull/379#issuecomment-74480604

            @hyunsik, thanks for your review.
            I created an issue to handle the potentially missed cases.
            https://issues.apache.org/jira/browse/TAJO-1352

            githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74480604 @hyunsik, thanks for your review. I created an issue to handle the potentially missed cases. https://issues.apache.org/jira/browse/TAJO-1352
            jihoonson Jihoon Son added a comment -

            Since this is an urgent blocker issue, I'll commit the patch shortly.

            jihoonson Jihoon Son added a comment - Since this is an urgent blocker issue, I'll commit the patch shortly.
            githubbot ASF GitHub Bot added a comment -

            Github user asfgit closed the pull request at:

            https://github.com/apache/tajo/pull/379

            githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/379
            jihoonson Jihoon Son added a comment -

            I committed the patch.
            sirpkt, thanks for your contribution!

            jihoonson Jihoon Son added a comment - I committed the patch. sirpkt , thanks for your contribution!
            hudson Hudson added a comment -

            SUCCESS: Integrated in Tajo-master-build #585 (See https://builds.apache.org/job/Tajo-master-build/585/)
            TAJO-1277: GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410)

            • tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql
            • tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result
            • tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java
            • CHANGES
            • tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java
            hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #585 (See https://builds.apache.org/job/Tajo-master-build/585/ ) TAJO-1277 : GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410) tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java CHANGES tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java
            hudson Hudson added a comment -

            FAILURE: Integrated in Tajo-master-CODEGEN-build #224 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/224/)
            TAJO-1277: GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410)

            • tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java
            • tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result
            • tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java
            • tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql
            • CHANGES
            hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #224 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/224/ ) TAJO-1277 : GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410) tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql CHANGES
            hyunsik Hyunsik Choi added a comment -

            I just pushed the change to branch-0.10.0.

            hyunsik Hyunsik Choi added a comment - I just pushed the change to branch-0.10.0.

            People

              sirpkt Keuntae Park
              sirpkt Keuntae Park
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: