Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1277

GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0, 0.11.0
    • Component/s: None
    • Labels:
      None

      Description

      It looks like GreedyHeuristicJoinOrderAlgorithm always assumes every joins are associative.

      Following query returns in inaccurate result:

      select * FROM
      customer c 
      right outer join nation n on c.c_custkey = n.n_nationkey
      join region r on c.c_custkey = r.r_regionkey;
      

      because GreedyHeuristicJoinOrderAlgorithm changes join order as

      select * FROM
      customer c 
      join region r on c.c_custkey = r.r_regionkey
      right outer join nation n on c.c_custkey = n.n_nationkey;
      

      I think getBestPair() should be fixed to avoid wrong join ordering.

        Issue Links

          Activity

          Hide
          ykrips Jihun Kang added a comment -

          How about rewriting logical plan for this case? I feel that rewriting logical plan like this seems to be more optimal, and lower cost.

          JOIN(8)(INNER)
            => Join Cond: default.c.c_custkey (INT8) = default.n.n_nationkey (INT8)
             SCAN(1) on default.nation as n
             JOIN(7)(INNER)
               => Join Cond: default.c.c_custkey (INT8) = default.r.r_regionkey (INT8)
                SCAN(3) on default.region as r
                SCAN(0) on default.customer as c
          
          Show
          ykrips Jihun Kang added a comment - How about rewriting logical plan for this case? I feel that rewriting logical plan like this seems to be more optimal, and lower cost. JOIN(8)(INNER) => Join Cond: default .c.c_custkey (INT8) = default .n.n_nationkey (INT8) SCAN(1) on default .nation as n JOIN(7)(INNER) => Join Cond: default .c.c_custkey (INT8) = default .r.r_regionkey (INT8) SCAN(3) on default .region as r SCAN(0) on default .customer as c
          Hide
          jihoonson Jihoon Son added a comment -

          For the above case, the customer and nation tables must be joined first, and then the result is joined with the region table.
          In addition, the join type should be the outer join for the first join.

          I think that we should handle joins separately according to their join types when optimizing the join order.

          Show
          jihoonson Jihoon Son added a comment - For the above case, the customer and nation tables must be joined first, and then the result is joined with the region table. In addition, the join type should be the outer join for the first join. I think that we should handle joins separately according to their join types when optimizing the join order.
          Hide
          ykrips Jihun Kang added a comment -

          Thanks for explanation. I have few knowledge of tajo core, and just thought about in a cost aspect.

          Show
          ykrips Jihun Kang added a comment - Thanks for explanation. I have few knowledge of tajo core, and just thought about in a cost aspect.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user sirpkt opened a pull request:

          https://github.com/apache/tajo/pull/379

          TAJO-1277: GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins

          Basically, it limits the range of join ordering until it meets outer join operations.
          For example, in the case of (((((a inner join b) inner join c) outer join d) inner join e) inner join f),
          join ordering is partitioned as three parts as
          1) (a inner join b) inner join c
          2) (result of 1) outer join d
          3) (((result of 2) inner join e) inner join f)

          Following modifications are included:

          • findBestOrder() is changed to partition join ordering
          • getBestPair() and findJoin() are changed to return the corresponding JoinEdges of the selected join because those JoinEdges should be removed before next join ordering

          It passes 'mvn clean install' and several join cases I tested,
          however, I'm not sure this is good approach.
          Please, leave me comments about the patch.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/sirpkt/tajo TAJO-1277

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/379.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #379


          commit 74802e228dd3af8e885cc724c32d5746939c2c23
          Author: Keuntae Park <sirpkt@apache.org>
          Date: 2015-02-09T09:21:04Z

          join optimizer is enhanced to distinguish non-associative join cases


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user sirpkt opened a pull request: https://github.com/apache/tajo/pull/379 TAJO-1277 : GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins Basically, it limits the range of join ordering until it meets outer join operations. For example, in the case of (((((a inner join b) inner join c) outer join d) inner join e) inner join f), join ordering is partitioned as three parts as 1) (a inner join b) inner join c 2) (result of 1) outer join d 3) (((result of 2) inner join e) inner join f) Following modifications are included: findBestOrder() is changed to partition join ordering getBestPair() and findJoin() are changed to return the corresponding JoinEdges of the selected join because those JoinEdges should be removed before next join ordering It passes 'mvn clean install' and several join cases I tested, however, I'm not sure this is good approach. Please, leave me comments about the patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sirpkt/tajo TAJO-1277 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/379.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #379 commit 74802e228dd3af8e885cc724c32d5746939c2c23 Author: Keuntae Park <sirpkt@apache.org> Date: 2015-02-09T09:21:04Z join optimizer is enhanced to distinguish non-associative join cases
          Hide
          jihoonson Jihoon Son added a comment -

          No problem.

          Show
          jihoonson Jihoon Son added a comment - No problem.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/379#discussion_r24393011

          — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
          @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
          JoinEdge bestPair;

          while (remainRelations.size() > 1) {
          + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
          +
          + for (LogicalNode relation : remainRelations) {
          + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
          + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
          + String relationCollection = TUtil.collectionToString(relationStrings, ",");
          + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + for (String relationString: relationStrings) {
          + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + }
          +
          + // check if the relation is the last piece of outer join
          + boolean endInnerRelation = false;
          + for (JoinEdge joinEdge: joinEdges) {
          + switch(joinEdge.getJoinType()) {
          + case LEFT_OUTER:
          + case RIGHT_OUTER:
          + case FULL_OUTER:
          — End diff –

          We should consider other join types such as ```anti join``` or ```semi join``` as well as ```inner join``` and ```outer join```.
          So, I think that the iteration must be stopped when finding ```anti join``` or ```semi join```.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24393011 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { + switch(joinEdge.getJoinType()) { + case LEFT_OUTER: + case RIGHT_OUTER: + case FULL_OUTER: — End diff – We should consider other join types such as ```anti join``` or ```semi join``` as well as ```inner join``` and ```outer join```. So, I think that the iteration must be stopped when finding ```anti join``` or ```semi join```.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/379#discussion_r24393231

          — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
          @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
          JoinEdge bestPair;

          while (remainRelations.size() > 1) {
          + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
          +
          + for (LogicalNode relation : remainRelations) {
          + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
          + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
          + String relationCollection = TUtil.collectionToString(relationStrings, ",");
          + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + for (String relationString: relationStrings) {
          + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + }
          +
          + // check if the relation is the last piece of outer join
          + boolean endInnerRelation = false;
          + for (JoinEdge joinEdge: joinEdges) {
          — End diff –

          Here, joinEdges seem to be sorted in the order of their occurrence in the user query.
          So, I wonder how we can guarantee that joinEdges are always sorted in that order.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24393231 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { — End diff – Here, joinEdges seem to be sorted in the order of their occurrence in the user query. So, I wonder how we can guarantee that joinEdges are always sorted in that order.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/379#issuecomment-73655137

          @sirpkt, thanks for your nice work.
          The basic approach looks good to me.
          I left some comments.

          Sometimes, outer joins can be associative under some conditions. (Please refer to http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)
          So, we should handle them, but I think that it should be done in another issue.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-73655137 @sirpkt, thanks for your nice work. The basic approach looks good to me. I left some comments. Sometimes, outer joins can be associative under some conditions. (Please refer to http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative , https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins ) So, we should handle them, but I think that it should be done in another issue.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user sirpkt commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/379#discussion_r24468860

          — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
          @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
          JoinEdge bestPair;

          while (remainRelations.size() > 1) {
          + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
          +
          + for (LogicalNode relation : remainRelations) {
          + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
          + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
          + String relationCollection = TUtil.collectionToString(relationStrings, ",");
          + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + for (String relationString: relationStrings) {
          + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + }
          +
          + // check if the relation is the last piece of outer join
          + boolean endInnerRelation = false;
          + for (JoinEdge joinEdge: joinEdges) {
          + switch(joinEdge.getJoinType()) {
          + case LEFT_OUTER:
          + case RIGHT_OUTER:
          + case FULL_OUTER:
          — End diff –

          Thank you for the comment, @jihoonson .
          I'll add other join types.

          Show
          githubbot ASF GitHub Bot added a comment - Github user sirpkt commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24468860 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { + switch(joinEdge.getJoinType()) { + case LEFT_OUTER: + case RIGHT_OUTER: + case FULL_OUTER: — End diff – Thank you for the comment, @jihoonson . I'll add other join types.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user sirpkt commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/379#discussion_r24469183

          — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
          @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
          JoinEdge bestPair;

          while (remainRelations.size() > 1) {
          + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
          +
          + for (LogicalNode relation : remainRelations) {
          + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
          + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
          + String relationCollection = TUtil.collectionToString(relationStrings, ",");
          + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + for (String relationString: relationStrings) {
          + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + }
          +
          + // check if the relation is the last piece of outer join
          + boolean endInnerRelation = false;
          + for (JoinEdge joinEdge: joinEdges) {
          — End diff –

          Actually, I thought that the order of JoinEdges in joinEdges list does not matter.
          Since the given relation is the right relation for all the JoinEdges,
          I just check the existence of JoinEdge that is outer join.
          And, in my thought, the result does not vary regardless of the order of JoinEdge's occurrence.

          Show
          githubbot ASF GitHub Bot added a comment - Github user sirpkt commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24469183 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { — End diff – Actually, I thought that the order of JoinEdges in joinEdges list does not matter. Since the given relation is the right relation for all the JoinEdges, I just check the existence of JoinEdge that is outer join. And, in my thought, the result does not vary regardless of the order of JoinEdge's occurrence.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/379#discussion_r24472011

          — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
          @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
          JoinEdge bestPair;

          while (remainRelations.size() > 1) {
          + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
          +
          + for (LogicalNode relation : remainRelations) {
          + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
          + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
          + String relationCollection = TUtil.collectionToString(relationStrings, ",");
          + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + for (String relationString: relationStrings) {
          + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + }
          +
          + // check if the relation is the last piece of outer join
          + boolean endInnerRelation = false;
          + for (JoinEdge joinEdge: joinEdges) {
          — End diff –

          You are right. It was my misunderstanding.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24472011 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { — End diff – You are right. It was my misunderstanding.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/379#discussion_r24472049

          — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
          @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
          JoinEdge bestPair;

          while (remainRelations.size() > 1) {
          + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
          +
          + for (LogicalNode relation : remainRelations) {
          + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
          + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
          + String relationCollection = TUtil.collectionToString(relationStrings, ",");
          + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + for (String relationString: relationStrings) {
          — End diff –

          Would you mind explaning these lines?
          I found that the same join edge is added twice to the ```joinEdgesForGiven``` list.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24472049 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { — End diff – Would you mind explaning these lines? I found that the same join edge is added twice to the ```joinEdgesForGiven``` list.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user sirpkt commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/379#discussion_r24553012

          — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
          @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
          JoinEdge bestPair;

          while (remainRelations.size() > 1) {
          + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
          +
          + for (LogicalNode relation : remainRelations) {
          + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
          + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
          + String relationCollection = TUtil.collectionToString(relationStrings, ",");
          + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + for (String relationString: relationStrings) {
          — End diff –

          Oh, it's my mistake.
          When relationStrings has only one entry, this code may adds that entry twice.

          When a LogicalNode contains two relations, for example, A and B,
          above code first finds joinEdges whose right relation is "A, B", which is obtained by TUtil.collectionToString().
          Next, it finds joinEdges whose right relation is "A" or "B", which is obtained by 'for (String relationString: relationStrings)'.
          So, if a LogicalNode contains just one relation, this code may adds that relation repeatedly.

          Duplicated relation does not affect the result but I'll patch not to have duplicated relations.

          Show
          githubbot ASF GitHub Bot added a comment - Github user sirpkt commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24553012 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { — End diff – Oh, it's my mistake. When relationStrings has only one entry, this code may adds that entry twice. When a LogicalNode contains two relations, for example, A and B, above code first finds joinEdges whose right relation is "A, B", which is obtained by TUtil.collectionToString(). Next, it finds joinEdges whose right relation is "A" or "B", which is obtained by 'for (String relationString: relationStrings)'. So, if a LogicalNode contains just one relation, this code may adds that relation repeatedly. Duplicated relation does not affect the result but I'll patch not to have duplicated relations.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/379#discussion_r24555499

          — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
          @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
          JoinEdge bestPair;

          while (remainRelations.size() > 1) {
          + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
          +
          + for (LogicalNode relation : remainRelations) {
          + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
          + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
          + String relationCollection = TUtil.collectionToString(relationStrings, ",");
          + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
          + if (joinEdgesForGiven != null)

          { + joinEdges.addAll(joinEdgesForGiven); + }

          + for (String relationString: relationStrings) {
          — End diff –

          Thanks for your explanation.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24555499 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { — End diff – Thanks for your explanation.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user sirpkt commented on the pull request:

          https://github.com/apache/tajo/pull/379#issuecomment-74016250

          I updated the patch to reflect the comments from @jihoonson .

          • add semi and anti join cases
          • avoid duplicated JoinEdges
          Show
          githubbot ASF GitHub Bot added a comment - Github user sirpkt commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74016250 I updated the patch to reflect the comments from @jihoonson . add semi and anti join cases avoid duplicated JoinEdges
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/379#issuecomment-74204669

          +1
          The latest patch looks good to me.
          Thanks for your work!

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74204669 +1 The latest patch looks good to me. Thanks for your work!
          Hide
          hyunsik Hyunsik Choi added a comment -

          Could you wait for a while? I just started reviewing it.

          Show
          hyunsik Hyunsik Choi added a comment - Could you wait for a while? I just started reviewing it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on the pull request:

          https://github.com/apache/tajo/pull/379#issuecomment-74479771

          +1
          The patch contains a workaround code which can fix the bug right now.

          I have some comments. Your workaround code seems to try to firstly find the order of joins whose join types is same. In my opinion, this approach may miss some of join order combinations. It may cause sub-optimal join orders in some cases. I think that the right way is to improve the findBestPair() to find only associative join orders. It would be great if we do it in another jira.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74479771 +1 The patch contains a workaround code which can fix the bug right now. I have some comments. Your workaround code seems to try to firstly find the order of joins whose join types is same. In my opinion, this approach may miss some of join order combinations. It may cause sub-optimal join orders in some cases. I think that the right way is to improve the findBestPair() to find only associative join orders. It would be great if we do it in another jira.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/379#issuecomment-74480604

          @hyunsik, thanks for your review.
          I created an issue to handle the potentially missed cases.
          https://issues.apache.org/jira/browse/TAJO-1352

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74480604 @hyunsik, thanks for your review. I created an issue to handle the potentially missed cases. https://issues.apache.org/jira/browse/TAJO-1352
          Hide
          jihoonson Jihoon Son added a comment -

          Since this is an urgent blocker issue, I'll commit the patch shortly.

          Show
          jihoonson Jihoon Son added a comment - Since this is an urgent blocker issue, I'll commit the patch shortly.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tajo/pull/379

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/379
          Hide
          jihoonson Jihoon Son added a comment -

          I committed the patch.
          Keuntae Park, thanks for your contribution!

          Show
          jihoonson Jihoon Son added a comment - I committed the patch. Keuntae Park , thanks for your contribution!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #585 (See https://builds.apache.org/job/Tajo-master-build/585/)
          TAJO-1277: GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410)

          • tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql
          • tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java
          • CHANGES
          • tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #585 (See https://builds.apache.org/job/Tajo-master-build/585/ ) TAJO-1277 : GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410) tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java CHANGES tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Tajo-master-CODEGEN-build #224 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/224/)
          TAJO-1277: GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410)

          • tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java
          • tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java
          • tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql
          • CHANGES
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #224 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/224/ ) TAJO-1277 : GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410) tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql CHANGES
          Hide
          hyunsik Hyunsik Choi added a comment -

          I just pushed the change to branch-0.10.0.

          Show
          hyunsik Hyunsik Choi added a comment - I just pushed the change to branch-0.10.0.

            People

            • Assignee:
              sirpkt Keuntae Park
              Reporter:
              sirpkt Keuntae Park
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development