[TAJO-1277] GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins - ASF JIRA

Jihun Kang added a comment - 02/Feb/15 08:15

How about rewriting logical plan for this case? I feel that rewriting logical plan like this seems to be more optimal, and lower cost.

JOIN(8)(INNER)
  => Join Cond: default.c.c_custkey (INT8) = default.n.n_nationkey (INT8)
   SCAN(1) on default.nation as n
   JOIN(7)(INNER)
     => Join Cond: default.c.c_custkey (INT8) = default.r.r_regionkey (INT8)
      SCAN(3) on default.region as r
      SCAN(0) on default.customer as c

Jihun Kang added a comment - 02/Feb/15 08:15 How about rewriting logical plan for this case? I feel that rewriting logical plan like this seems to be more optimal, and lower cost. JOIN(8)(INNER) => Join Cond: default .c.c_custkey (INT8) = default .n.n_nationkey (INT8) SCAN(1) on default .nation as n JOIN(7)(INNER) => Join Cond: default .c.c_custkey (INT8) = default .r.r_regionkey (INT8) SCAN(3) on default .region as r SCAN(0) on default .customer as c

Jihoon Son added a comment - 02/Feb/15 09:25

For the above case, the customer and nation tables must be joined first, and then the result is joined with the region table.
In addition, the join type should be the outer join for the first join.

I think that we should handle joins separately according to their join types when optimizing the join order.

Jihoon Son added a comment - 02/Feb/15 09:25 For the above case, the customer and nation tables must be joined first, and then the result is joined with the region table. In addition, the join type should be the outer join for the first join. I think that we should handle joins separately according to their join types when optimizing the join order.

Jihun Kang added a comment - 02/Feb/15 10:46

Thanks for explanation. I have few knowledge of tajo core, and just thought about in a cost aspect.

Jihun Kang added a comment - 02/Feb/15 10:46 Thanks for explanation. I have few knowledge of tajo core, and just thought about in a cost aspect.

ASF GitHub Bot added a comment - 10/Feb/15 06:15

GitHub user sirpkt opened a pull request:

https://github.com/apache/tajo/pull/379

~~TAJO-1277~~: GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins

Basically, it limits the range of join ordering until it meets outer join operations.
For example, in the case of (((((a inner join b) inner join c) outer join d) inner join e) inner join f),
join ordering is partitioned as three parts as
1) (a inner join b) inner join c
2) (result of 1) outer join d
3) (((result of 2) inner join e) inner join f)

Following modifications are included:

findBestOrder() is changed to partition join ordering
getBestPair() and findJoin() are changed to return the corresponding JoinEdges of the selected join because those JoinEdges should be removed before next join ordering

It passes 'mvn clean install' and several join cases I tested,
however, I'm not sure this is good approach.
Please, leave me comments about the patch.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sirpkt/tajo ~~TAJO-1277~~

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tajo/pull/379.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #379

commit 74802e228dd3af8e885cc724c32d5746939c2c23
Author: Keuntae Park <sirpkt@apache.org>
Date: 2015-02-09T09:21:04Z

join optimizer is enhanced to distinguish non-associative join cases

ASF GitHub Bot added a comment - 10/Feb/15 06:15 GitHub user sirpkt opened a pull request: https://github.com/apache/tajo/pull/379 TAJO-1277 : GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins Basically, it limits the range of join ordering until it meets outer join operations. For example, in the case of (((((a inner join b) inner join c) outer join d) inner join e) inner join f), join ordering is partitioned as three parts as 1) (a inner join b) inner join c 2) (result of 1) outer join d 3) (((result of 2) inner join e) inner join f) Following modifications are included: findBestOrder() is changed to partition join ordering getBestPair() and findJoin() are changed to return the corresponding JoinEdges of the selected join because those JoinEdges should be removed before next join ordering It passes 'mvn clean install' and several join cases I tested, however, I'm not sure this is good approach. Please, leave me comments about the patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sirpkt/tajo TAJO-1277 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/379.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #379 commit 74802e228dd3af8e885cc724c32d5746939c2c23 Author: Keuntae Park <sirpkt@apache.org> Date: 2015-02-09T09:21:04Z join optimizer is enhanced to distinguish non-associative join cases

Jihoon Son added a comment - 10/Feb/15 06:59

No problem.

Jihoon Son added a comment - 10/Feb/15 06:59 No problem.

ASF GitHub Bot added a comment - 10/Feb/15 07:12

Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/379#discussion_r24393011

— Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
@@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
JoinEdge bestPair;

while (remainRelations.size() > 1) {
+ Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
+
+ for (LogicalNode relation : remainRelations) {
+ Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
+ List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
+ String relationCollection = TUtil.collectionToString(relationStrings, ",");
+ List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ for (String relationString: relationStrings) {
+ joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ }
+
+ // check if the relation is the last piece of outer join
+ boolean endInnerRelation = false;
+ for (JoinEdge joinEdge: joinEdges) {
+ switch(joinEdge.getJoinType()) {
+ case LEFT_OUTER:
+ case RIGHT_OUTER:
+ case FULL_OUTER:
— End diff –

We should consider other join types such as ```anti join``` or ```semi join``` as well as ```inner join``` and ```outer join```.
So, I think that the iteration must be stopped when finding ```anti join``` or ```semi join```.

ASF GitHub Bot added a comment - 10/Feb/15 07:12 Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24393011 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { + switch(joinEdge.getJoinType()) { + case LEFT_OUTER: + case RIGHT_OUTER: + case FULL_OUTER: — End diff – We should consider other join types such as ```anti join``` or ```semi join``` as well as ```inner join``` and ```outer join```. So, I think that the iteration must be stopped when finding ```anti join``` or ```semi join```.

ASF GitHub Bot added a comment - 10/Feb/15 07:19

Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/379#discussion_r24393231

— Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
@@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
JoinEdge bestPair;

while (remainRelations.size() > 1) {
+ Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
+
+ for (LogicalNode relation : remainRelations) {
+ Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
+ List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
+ String relationCollection = TUtil.collectionToString(relationStrings, ",");
+ List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ for (String relationString: relationStrings) {
+ joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ }
+
+ // check if the relation is the last piece of outer join
+ boolean endInnerRelation = false;
+ for (JoinEdge joinEdge: joinEdges) {
— End diff –

Here, joinEdges seem to be sorted in the order of their occurrence in the user query.
So, I wonder how we can guarantee that joinEdges are always sorted in that order.

ASF GitHub Bot added a comment - 10/Feb/15 07:19 Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24393231 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { — End diff – Here, joinEdges seem to be sorted in the order of their occurrence in the user query. So, I wonder how we can guarantee that joinEdges are always sorted in that order.

ASF GitHub Bot added a comment - 10/Feb/15 07:21

Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/379#issuecomment-73655137

@sirpkt, thanks for your nice work.
The basic approach looks good to me.
I left some comments.

Sometimes, outer joins can be associative under some conditions. (Please refer to http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)
So, we should handle them, but I think that it should be done in another issue.

ASF GitHub Bot added a comment - 10/Feb/15 07:21 Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-73655137 @sirpkt, thanks for your nice work. The basic approach looks good to me. I left some comments. Sometimes, outer joins can be associative under some conditions. (Please refer to http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative , https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins ) So, we should handle them, but I think that it should be done in another issue.

ASF GitHub Bot added a comment - 11/Feb/15 01:39

Github user sirpkt commented on a diff in the pull request:

https://github.com/apache/tajo/pull/379#discussion_r24468860

— Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
@@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
JoinEdge bestPair;

while (remainRelations.size() > 1) {
+ Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
+
+ for (LogicalNode relation : remainRelations) {
+ Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
+ List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
+ String relationCollection = TUtil.collectionToString(relationStrings, ",");
+ List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ for (String relationString: relationStrings) {
+ joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ }
+
+ // check if the relation is the last piece of outer join
+ boolean endInnerRelation = false;
+ for (JoinEdge joinEdge: joinEdges) {
+ switch(joinEdge.getJoinType()) {
+ case LEFT_OUTER:
+ case RIGHT_OUTER:
+ case FULL_OUTER:
— End diff –

Thank you for the comment, @jihoonson .
I'll add other join types.

ASF GitHub Bot added a comment - 11/Feb/15 01:39 Github user sirpkt commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24468860 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { + switch(joinEdge.getJoinType()) { + case LEFT_OUTER: + case RIGHT_OUTER: + case FULL_OUTER: — End diff – Thank you for the comment, @jihoonson . I'll add other join types.

ASF GitHub Bot added a comment - 11/Feb/15 01:46

Github user sirpkt commented on a diff in the pull request:

https://github.com/apache/tajo/pull/379#discussion_r24469183

— Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
@@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
JoinEdge bestPair;

while (remainRelations.size() > 1) {
+ Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
+
+ for (LogicalNode relation : remainRelations) {
+ Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
+ List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
+ String relationCollection = TUtil.collectionToString(relationStrings, ",");
+ List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ for (String relationString: relationStrings) {
+ joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ }
+
+ // check if the relation is the last piece of outer join
+ boolean endInnerRelation = false;
+ for (JoinEdge joinEdge: joinEdges) {
— End diff –

Actually, I thought that the order of JoinEdges in joinEdges list does not matter.
Since the given relation is the right relation for all the JoinEdges,
I just check the existence of JoinEdge that is outer join.
And, in my thought, the result does not vary regardless of the order of JoinEdge's occurrence.

ASF GitHub Bot added a comment - 11/Feb/15 01:46 Github user sirpkt commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24469183 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { — End diff – Actually, I thought that the order of JoinEdges in joinEdges list does not matter. Since the given relation is the right relation for all the JoinEdges, I just check the existence of JoinEdge that is outer join. And, in my thought, the result does not vary regardless of the order of JoinEdge's occurrence.

ASF GitHub Bot added a comment - 11/Feb/15 03:09

Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/379#discussion_r24472011

— Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
@@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
JoinEdge bestPair;

while (remainRelations.size() > 1) {
+ Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
+
+ for (LogicalNode relation : remainRelations) {
+ Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
+ List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
+ String relationCollection = TUtil.collectionToString(relationStrings, ",");
+ List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ for (String relationString: relationStrings) {
+ joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ }
+
+ // check if the relation is the last piece of outer join
+ boolean endInnerRelation = false;
+ for (JoinEdge joinEdge: joinEdges) {
— End diff –

You are right. It was my misunderstanding.

ASF GitHub Bot added a comment - 11/Feb/15 03:09 Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24472011 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { + joinEdgesForGiven = joinGraph.getIncomingEdges(relationString); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + } + + // check if the relation is the last piece of outer join + boolean endInnerRelation = false; + for (JoinEdge joinEdge: joinEdges) { — End diff – You are right. It was my misunderstanding.

ASF GitHub Bot added a comment - 11/Feb/15 03:11

Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/379#discussion_r24472049

— Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
@@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
JoinEdge bestPair;

while (remainRelations.size() > 1) {
+ Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
+
+ for (LogicalNode relation : remainRelations) {
+ Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
+ List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
+ String relationCollection = TUtil.collectionToString(relationStrings, ",");
+ List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ for (String relationString: relationStrings) {
— End diff –

Would you mind explaning these lines?
I found that the same join edge is added twice to the ```joinEdgesForGiven``` list.

ASF GitHub Bot added a comment - 11/Feb/15 03:11 Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24472049 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { — End diff – Would you mind explaning these lines? I found that the same join edge is added twice to the ```joinEdgesForGiven``` list.

ASF GitHub Bot added a comment - 12/Feb/15 02:01

Github user sirpkt commented on a diff in the pull request:

https://github.com/apache/tajo/pull/379#discussion_r24553012

— Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
@@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
JoinEdge bestPair;

while (remainRelations.size() > 1) {
+ Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
+
+ for (LogicalNode relation : remainRelations) {
+ Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
+ List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
+ String relationCollection = TUtil.collectionToString(relationStrings, ",");
+ List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ for (String relationString: relationStrings) {
— End diff –

Oh, it's my mistake.
When relationStrings has only one entry, this code may adds that entry twice.

When a LogicalNode contains two relations, for example, A and B,
above code first finds joinEdges whose right relation is "A, B", which is obtained by TUtil.collectionToString().
Next, it finds joinEdges whose right relation is "A" or "B", which is obtained by 'for (String relationString: relationStrings)'.
So, if a LogicalNode contains just one relation, this code may adds that relation repeatedly.

Duplicated relation does not affect the result but I'll patch not to have duplicated relations.

ASF GitHub Bot added a comment - 12/Feb/15 02:01 Github user sirpkt commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24553012 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { — End diff – Oh, it's my mistake. When relationStrings has only one entry, this code may adds that entry twice. When a LogicalNode contains two relations, for example, A and B, above code first finds joinEdges whose right relation is "A, B", which is obtained by TUtil.collectionToString(). Next, it finds joinEdges whose right relation is "A" or "B", which is obtained by 'for (String relationString: relationStrings)'. So, if a LogicalNode contains just one relation, this code may adds that relation repeatedly. Duplicated relation does not affect the result but I'll patch not to have duplicated relations.

ASF GitHub Bot added a comment - 12/Feb/15 03:11

Github user jihoonson commented on a diff in the pull request:

https://github.com/apache/tajo/pull/379#discussion_r24555499

— Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java —
@@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo
JoinEdge bestPair;

while (remainRelations.size() > 1) {
+ Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>();
+
+ for (LogicalNode relation : remainRelations) {
+ Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
+ List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
+ String relationCollection = TUtil.collectionToString(relationStrings, ",");
+ List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection);
+ if (joinEdgesForGiven != null)

{ + joinEdges.addAll(joinEdgesForGiven); + }

+ for (String relationString: relationStrings) {
— End diff –

Thanks for your explanation.

ASF GitHub Bot added a comment - 12/Feb/15 03:11 Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/379#discussion_r24555499 — Diff: tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java — @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, LogicalPlan.QueryBlock blo JoinEdge bestPair; while (remainRelations.size() > 1) { + Set<LogicalNode> checkingRelations = new LinkedHashSet<LogicalNode>(); + + for (LogicalNode relation : remainRelations) { + Collection <String> relationStrings = PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation); + List<JoinEdge> joinEdges = new ArrayList<JoinEdge>(); + String relationCollection = TUtil.collectionToString(relationStrings, ","); + List<JoinEdge> joinEdgesForGiven = joinGraph.getIncomingEdges(relationCollection); + if (joinEdgesForGiven != null) { + joinEdges.addAll(joinEdgesForGiven); + } + for (String relationString: relationStrings) { — End diff – Thanks for your explanation.

ASF GitHub Bot added a comment - 12/Feb/15 04:34

Github user sirpkt commented on the pull request:

https://github.com/apache/tajo/pull/379#issuecomment-74016250

I updated the patch to reflect the comments from @jihoonson .

add semi and anti join cases
avoid duplicated JoinEdges

ASF GitHub Bot added a comment - 12/Feb/15 04:34 Github user sirpkt commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74016250 I updated the patch to reflect the comments from @jihoonson . add semi and anti join cases avoid duplicated JoinEdges

ASF GitHub Bot added a comment - 13/Feb/15 04:45

Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/379#issuecomment-74204669

+1
The latest patch looks good to me.
Thanks for your work!

ASF GitHub Bot added a comment - 13/Feb/15 04:45 Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74204669 +1 The latest patch looks good to me. Thanks for your work!

Hyunsik Choi added a comment - 13/Feb/15 09:47

Could you wait for a while? I just started reviewing it.

Hyunsik Choi added a comment - 13/Feb/15 09:47 Could you wait for a while? I just started reviewing it.

ASF GitHub Bot added a comment - 16/Feb/15 09:25

Github user hyunsik commented on the pull request:

https://github.com/apache/tajo/pull/379#issuecomment-74479771

+1
The patch contains a workaround code which can fix the bug right now.

I have some comments. Your workaround code seems to try to firstly find the order of joins whose join types is same. In my opinion, this approach may miss some of join order combinations. It may cause sub-optimal join orders in some cases. I think that the right way is to improve the findBestPair() to find only associative join orders. It would be great if we do it in another jira.

ASF GitHub Bot added a comment - 16/Feb/15 09:25 Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74479771 +1 The patch contains a workaround code which can fix the bug right now. I have some comments. Your workaround code seems to try to firstly find the order of joins whose join types is same. In my opinion, this approach may miss some of join order combinations. It may cause sub-optimal join orders in some cases. I think that the right way is to improve the findBestPair() to find only associative join orders. It would be great if we do it in another jira.

ASF GitHub Bot added a comment - 16/Feb/15 09:32

Github user jihoonson commented on the pull request:

https://github.com/apache/tajo/pull/379#issuecomment-74480604

@hyunsik, thanks for your review.
I created an issue to handle the potentially missed cases.
https://issues.apache.org/jira/browse/TAJO-1352

ASF GitHub Bot added a comment - 16/Feb/15 09:32 Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/379#issuecomment-74480604 @hyunsik, thanks for your review. I created an issue to handle the potentially missed cases. https://issues.apache.org/jira/browse/TAJO-1352

Jihoon Son added a comment - 17/Feb/15 01:48

Since this is an urgent blocker issue, I'll commit the patch shortly.

Jihoon Son added a comment - 17/Feb/15 01:48 Since this is an urgent blocker issue, I'll commit the patch shortly.

ASF GitHub Bot added a comment - 17/Feb/15 01:52

Github user asfgit closed the pull request at:

https://github.com/apache/tajo/pull/379

ASF GitHub Bot added a comment - 17/Feb/15 01:52 Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/379

Jihoon Son added a comment - 17/Feb/15 01:53

I committed the patch.
sirpkt, thanks for your contribution!

Jihoon Son added a comment - 17/Feb/15 01:53 I committed the patch. sirpkt , thanks for your contribution!

Hudson added a comment - 17/Feb/15 02:10

SUCCESS: Integrated in Tajo-master-build #585 (See https://builds.apache.org/job/Tajo-master-build/585/)
~~TAJO-1277~~: GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410)

tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql
tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result
tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java
CHANGES
tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java

Hudson added a comment - 17/Feb/15 02:10 SUCCESS: Integrated in Tajo-master-build #585 (See https://builds.apache.org/job/Tajo-master-build/585/ ) TAJO-1277 : GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410) tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java CHANGES tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java

Hudson added a comment - 17/Feb/15 02:10

FAILURE: Integrated in Tajo-master-CODEGEN-build #224 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/224/)
~~TAJO-1277~~: GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410)

tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java
tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result
tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java
tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql
CHANGES

Hudson added a comment - 17/Feb/15 02:10 FAILURE: Integrated in Tajo-master-CODEGEN-build #224 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/224/ ) TAJO-1277 : GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins. (Keuntae Park via jihoon) (jihoonson: rev 95d9cc455dd7d25da6dd0df709084d7346efb410) tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java tajo-core/src/test/resources/results/TestJoinQuery/testJoinWithMultipleJoinTypes.result tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithMultipleJoinTypes.sql CHANGES

Hyunsik Choi added a comment - 20/Feb/15 18:42

I just pushed the change to branch-0.10.0.

Hyunsik Choi added a comment - 20/Feb/15 18:42 I just pushed the change to branch-0.10.0.

Tajo (Retired)

GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of joins

Details

Description

Attachments

Issue Links

Activity

People

Dates