Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1368

Exceptions during processing nested union queries

    Details

      Description

      This error is reported from the Apache Tajo Korea user group (https://groups.google.com/forum/#!topic/tajo-user-kr/tipQj6cv-k0).

      You can reproduce the reported errors as follows:

      default> create table test_union_all (cd  int, nm text);
      default> insert into test_union_all select 1,'aaa';
      default> insert into test_union_all select 1,'bbb';
      

      Case 1
      Distributed query planner emits NPE as follows:

      default> select * from 
      (
      	 select * from test_union_all
      	 union all
      	 select * from test_union_all
      )a
      union all
      select * from 
      (
      	select * from test_union_all
      	union all
      	select * from test_union_all
      )a;
      
      2015-03-03 13:59:12,738 ERROR org.apache.tajo.querymaster.QueryMasterTask:
      java.lang.NullPointerException
              at org.apache.tajo.engine.planner.global.DataChannel.<init>(DataChannel.java:57)
              at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitUnion(GlobalPlanner.java:1405)
              at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitUnion(GlobalPlanner.java:1140)
              at org.apache.tajo.plan.visitor.BasicLogicalPlanVisitor.visit(BasicLogicalPlanVisitor.java:96)
              at org.apache.tajo.plan.visitor.BasicLogicalPlanVisitor.visitRoot(BasicLogicalPlanVisitor.java:151)
              at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitRoot(GlobalPlanner.java:1145)
              at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitRoot(GlobalPlanner.java:1140)
              at org.apache.tajo.plan.visitor.BasicLogicalPlanVisitor.visit(BasicLogicalPlanVisitor.java:60)
              at org.apache.tajo.engine.planner.global.GlobalPlanner.build(GlobalPlanner.java:146)
              at org.apache.tajo.querymaster.QueryMasterTask.startQuery(QueryMasterTask.java:357)
              at org.apache.tajo.querymaster.QueryMasterTask.start(QueryMasterTask.java:172)
              at org.apache.tajo.querymaster.QueryMaster$QueryStartEventHandler.handle(QueryMaster.java:410)
              at org.apache.tajo.querymaster.QueryMaster$QueryStartEventHandler.handle(QueryMaster.java:397)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
              at java.lang.Thread.run(Thread.java:745)
      

      Case 2
      TaskRunner emits NPE as follows:

      default>  select * from (
      		  select cd, nm
      		  from 
      		  (
      				select cd, nm
      				from 
      				(
      				       select cd, nm
      					from test_union_all             
      				)a   
      				union all 
      				select cd, nm
      				from 
      				( 
      				     select cd, nm
      				     from test_union_all        
      				)a
      
      		) a  
       )a;
      
      2015-03-03 14:05:02,960 ERROR org.apache.tajo.worker.Task: java.lang.IllegalArgumentException: URI scheme is not "file"
      java.io.IOException: java.lang.IllegalArgumentException: URI scheme is not "file"
              at org.apache.tajo.storage.RawFile$RawFileAppender.init(RawFile.java:485)
              at org.apache.tajo.engine.planner.physical.StoreTableExec.openNewFile(StoreTableExec.java:110)
              at org.apache.tajo.engine.planner.physical.StoreTableExec.init(StoreTableExec.java:79)
              at org.apache.tajo.worker.Task.run(Task.java:405)
              at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:276)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.IllegalArgumentException: URI scheme is not "file"
              at java.io.File.<init>(File.java:421)
              at org.apache.tajo.storage.RawFile$RawFileAppender.init(RawFile.java:480)
              ... 5 more
      2015-03-03 14:05:02,963 ERROR org.apache.tajo.worker.TaskRunner:
      java.lang.NullPointerException
              at org.apache.tajo.storage.RawFile$RawFileAppender.close(RawFile.java:756)
              at org.apache.tajo.engine.planner.physical.StoreTableExec.close(StoreTableExec.java:143)
              at org.apache.tajo.worker.Task.run(Task.java:417)
              at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:276)
              at java.lang.Thread.run(Thread.java:745)
      

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #621 (See https://builds.apache.org/job/Tajo-master-build/621/)
        TAJO-1368: Exceptions during processing nested union queries (jihun: rev 725448c5249cd8691ea167f595237ed7bcc22293)

        • tajo-core/src/test/java/org/apache/tajo/engine/query/TestCTASQuery.java
        • tajo-core/src/test/resources/queries/TestCTASQuery/CtasWithMultipleUnions.sql
        • tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
        • tajo-core/src/test/resources/queries/TestCTASQuery/testCtasWithMultipleUnions.sql
        • CHANGES
        • tajo-core/src/test/java/org/apache/tajo/engine/query/TestUnionQuery.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #621 (See https://builds.apache.org/job/Tajo-master-build/621/ ) TAJO-1368 : Exceptions during processing nested union queries (jihun: rev 725448c5249cd8691ea167f595237ed7bcc22293) tajo-core/src/test/java/org/apache/tajo/engine/query/TestCTASQuery.java tajo-core/src/test/resources/queries/TestCTASQuery/CtasWithMultipleUnions.sql tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java tajo-core/src/test/resources/queries/TestCTASQuery/testCtasWithMultipleUnions.sql CHANGES tajo-core/src/test/java/org/apache/tajo/engine/query/TestUnionQuery.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #258 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/258/)
        TAJO-1368: Exceptions during processing nested union queries (jihun: rev 725448c5249cd8691ea167f595237ed7bcc22293)

        • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
        • tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java
        • CHANGES
        • tajo-core/src/test/resources/queries/TestCTASQuery/testCtasWithMultipleUnions.sql
        • tajo-core/src/test/java/org/apache/tajo/engine/query/TestUnionQuery.java
        • tajo-core/src/test/java/org/apache/tajo/engine/query/TestCTASQuery.java
        • tajo-core/src/test/resources/queries/TestCTASQuery/CtasWithMultipleUnions.sql
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #258 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/258/ ) TAJO-1368 : Exceptions during processing nested union queries (jihun: rev 725448c5249cd8691ea167f595237ed7bcc22293) tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java CHANGES tajo-core/src/test/resources/queries/TestCTASQuery/testCtasWithMultipleUnions.sql tajo-core/src/test/java/org/apache/tajo/engine/query/TestUnionQuery.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestCTASQuery.java tajo-core/src/test/resources/queries/TestCTASQuery/CtasWithMultipleUnions.sql
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/402

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/402
        Hide
        jihoonson Jihoon Son added a comment -

        +1
        ship it!

        Show
        jihoonson Jihoon Son added a comment - +1 ship it!
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12705355/TAJO-1368.patch
        against master revision release-0.9.0-rc0-206-g82d44af.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in tajo-core.

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/619//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/619//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/619//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705355/TAJO-1368.patch against master revision release-0.9.0-rc0-206-g82d44af. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-core. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/619//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/619//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/619//console This message is automatically generated.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/402#issuecomment-82678777

        Thanks a lot!
        The patch looks great.
        I'll give +1 after CI test is finished.
        It will be good to put your latest patch on Jira because Travis is too slow.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/402#issuecomment-82678777 Thanks a lot! The patch looks great. I'll give +1 after CI test is finished. It will be good to put your latest patch on Jira because Travis is too slow.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user ykrips commented on the pull request:

        https://github.com/apache/tajo/pull/402#issuecomment-82218213

        Good finding!
        I also figure out this error.

        Show
        githubbot ASF GitHub Bot added a comment - Github user ykrips commented on the pull request: https://github.com/apache/tajo/pull/402#issuecomment-82218213 Good finding! I also figure out this error.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/402#issuecomment-82199826

        Hi @ykrips thanks for your nice patch.
        It looks good, but I have another suggestion even though it is not directly related with this issue.

        I think that ```union``` queries are popularly used for ETLs. So, I tested the following query. (It is a CTAS statement of the case 1 of TAJO-1368.)
        ```
        create table test using parquet as
        select * from
        (select c_custkey, c_nationkey from customer where c_nationkey < 0
        union all
        select c_custkey, c_nationkey from customer where c_nationkey > 0
        ) a
        union all
        select * from
        (select c_custkey, c_nationkey from customer where c_nationkey < 0
        union all
        select c_custkey, c_nationkey from customer where c_nationkey > 0
        ) b
        ```
        And, I found the following error.
        ```
        2015-03-17 17:29:42,257 ERROR: org.apache.tajo.querymaster.QueryMasterTask (startQuery(368)) -
        java.lang.NullPointerException
        at org.apache.tajo.engine.planner.global.GlobalPlanner.buildNoPartitionedStorePlan(GlobalPlanner.java:1113)
        at org.apache.tajo.engine.planner.global.GlobalPlanner.buildStorePlan(GlobalPlanner.java:1037)
        at org.apache.tajo.engine.planner.global.GlobalPlanner.access$600(GlobalPlanner.java:68)
        at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitCreateTable(GlobalPlanner.java:1570)
        at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitCreateTable(GlobalPlanner.java:1141)
        at org.apache.tajo.plan.visitor.BasicLogicalPlanVisitor.visit(BasicLogicalPlanVisitor.java:126)
        at org.apache.tajo.plan.visitor.BasicLogicalPlanVisitor.visitRoot(BasicLogicalPlanVisitor.java:151)
        at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitRoot(GlobalPlanner.java:1146)
        at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitRoot(GlobalPlanner.java:1141)
        at org.apache.tajo.plan.visitor.BasicLogicalPlanVisitor.visit(BasicLogicalPlanVisitor.java:60)
        at org.apache.tajo.engine.planner.global.GlobalPlanner.build(GlobalPlanner.java:147)
        at org.apache.tajo.querymaster.QueryMasterTask.startQuery(QueryMasterTask.java:360)
        at org.apache.tajo.querymaster.QueryMasterTask.start(QueryMasterTask.java:175)
        at org.apache.tajo.querymaster.QueryMaster$QueryStartEventHandler.handle(QueryMaster.java:416)
        at org.apache.tajo.querymaster.QueryMaster$QueryStartEventHandler.handle(QueryMaster.java:403)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
        at java.lang.Thread.run(Thread.java:745)
        ```
        This error seems to be caused by misfunctioning of ```GlobalPlanner.hasUnionChild()```.
        I think it would be great if you fix this bug, too.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/402#issuecomment-82199826 Hi @ykrips thanks for your nice patch. It looks good, but I have another suggestion even though it is not directly related with this issue. I think that ```union``` queries are popularly used for ETLs. So, I tested the following query. (It is a CTAS statement of the case 1 of TAJO-1368 .) ``` create table test using parquet as select * from (select c_custkey, c_nationkey from customer where c_nationkey < 0 union all select c_custkey, c_nationkey from customer where c_nationkey > 0 ) a union all select * from (select c_custkey, c_nationkey from customer where c_nationkey < 0 union all select c_custkey, c_nationkey from customer where c_nationkey > 0 ) b ``` And, I found the following error. ``` 2015-03-17 17:29:42,257 ERROR: org.apache.tajo.querymaster.QueryMasterTask (startQuery(368)) - java.lang.NullPointerException at org.apache.tajo.engine.planner.global.GlobalPlanner.buildNoPartitionedStorePlan(GlobalPlanner.java:1113) at org.apache.tajo.engine.planner.global.GlobalPlanner.buildStorePlan(GlobalPlanner.java:1037) at org.apache.tajo.engine.planner.global.GlobalPlanner.access$600(GlobalPlanner.java:68) at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitCreateTable(GlobalPlanner.java:1570) at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitCreateTable(GlobalPlanner.java:1141) at org.apache.tajo.plan.visitor.BasicLogicalPlanVisitor.visit(BasicLogicalPlanVisitor.java:126) at org.apache.tajo.plan.visitor.BasicLogicalPlanVisitor.visitRoot(BasicLogicalPlanVisitor.java:151) at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitRoot(GlobalPlanner.java:1146) at org.apache.tajo.engine.planner.global.GlobalPlanner$DistributedPlannerVisitor.visitRoot(GlobalPlanner.java:1141) at org.apache.tajo.plan.visitor.BasicLogicalPlanVisitor.visit(BasicLogicalPlanVisitor.java:60) at org.apache.tajo.engine.planner.global.GlobalPlanner.build(GlobalPlanner.java:147) at org.apache.tajo.querymaster.QueryMasterTask.startQuery(QueryMasterTask.java:360) at org.apache.tajo.querymaster.QueryMasterTask.start(QueryMasterTask.java:175) at org.apache.tajo.querymaster.QueryMaster$QueryStartEventHandler.handle(QueryMaster.java:416) at org.apache.tajo.querymaster.QueryMaster$QueryStartEventHandler.handle(QueryMaster.java:403) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) ``` This error seems to be caused by misfunctioning of ```GlobalPlanner.hasUnionChild()```. I think it would be great if you fix this bug, too.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/402#issuecomment-81316946

        Sorry for delayed review.
        I'll review tonight.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/402#issuecomment-81316946 Sorry for delayed review. I'll review tonight.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user ykrips commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/402#discussion_r26313458

        — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java —
        @@ -1371,32 +1373,57 @@ public LogicalNode visitUnion(GlobalPlanContext context, LogicalPlan plan, Logic
        LogicalPlan.QueryBlock rightQueryBlock = plan.getBlock(node.getRightChild());
        LogicalNode rightChild = visit(context, plan, rightQueryBlock, rightQueryBlock.getRoot(), stack);
        stack.pop();
        +
        + MasterPlan masterPlan = context.getPlan();

        List<ExecutionBlock> unionBlocks = Lists.newArrayList();
        List<ExecutionBlock> queryBlockBlocks = Lists.newArrayList();

        ExecutionBlock leftBlock = context.execBlockMap.remove(leftChild.getPID());
        ExecutionBlock rightBlock = context.execBlockMap.remove(rightChild.getPID());
        +
        + boolean leftUnion = (leftChild.getType() == NodeType.TABLE_SUBQUERY) &&
        + (((TableSubQueryNode)leftChild).getSubQuery().getType() == NodeType.UNION);
        + boolean rightUnion = (rightChild.getType() == NodeType.TABLE_SUBQUERY) &&
        + (((TableSubQueryNode)rightChild).getSubQuery().getType() == NodeType.UNION);
        if (leftChild.getType() == NodeType.UNION)

        { unionBlocks.add(leftBlock); }

        else {

        • queryBlockBlocks.add(leftBlock);
          + if (leftUnion) { + node.setLeftChild( + createScanNodeWithTableSubQuery(masterPlan, + (TableSubQueryNode)leftChild, leftBlock, leftBlock, + rightUnion? null: rightBlock, true)); + }

          else

          { + queryBlockBlocks.add(leftBlock); + }

          }
          if (rightChild.getType() == NodeType.UNION)

          { unionBlocks.add(rightBlock); }

          else {

        • queryBlockBlocks.add(rightBlock);
          + if (rightUnion) { + node.setRightChild( + createScanNodeWithTableSubQuery(masterPlan, + (TableSubQueryNode)rightChild, rightBlock, leftUnion? leftBlock: rightBlock, + leftUnion? null: leftBlock, false)); + }

          else

          { + queryBlockBlocks.add(rightBlock); + }

          }

        ExecutionBlock execBlock;
        if (unionBlocks.size() == 0) {

        • execBlock = context.plan.newExecutionBlock();
          + if (leftUnion || rightUnion) {
          + execBlock = (!leftUnion && rightUnion)? rightBlock: leftBlock;
            • End diff –

        This line is no more necessary. I will upload changes for this patch.

        Show
        githubbot ASF GitHub Bot added a comment - Github user ykrips commented on a diff in the pull request: https://github.com/apache/tajo/pull/402#discussion_r26313458 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java — @@ -1371,32 +1373,57 @@ public LogicalNode visitUnion(GlobalPlanContext context, LogicalPlan plan, Logic LogicalPlan.QueryBlock rightQueryBlock = plan.getBlock(node.getRightChild()); LogicalNode rightChild = visit(context, plan, rightQueryBlock, rightQueryBlock.getRoot(), stack); stack.pop(); + + MasterPlan masterPlan = context.getPlan(); List<ExecutionBlock> unionBlocks = Lists.newArrayList(); List<ExecutionBlock> queryBlockBlocks = Lists.newArrayList(); ExecutionBlock leftBlock = context.execBlockMap.remove(leftChild.getPID()); ExecutionBlock rightBlock = context.execBlockMap.remove(rightChild.getPID()); + + boolean leftUnion = (leftChild.getType() == NodeType.TABLE_SUBQUERY) && + (((TableSubQueryNode)leftChild).getSubQuery().getType() == NodeType.UNION); + boolean rightUnion = (rightChild.getType() == NodeType.TABLE_SUBQUERY) && + (((TableSubQueryNode)rightChild).getSubQuery().getType() == NodeType.UNION); if (leftChild.getType() == NodeType.UNION) { unionBlocks.add(leftBlock); } else { queryBlockBlocks.add(leftBlock); + if (leftUnion) { + node.setLeftChild( + createScanNodeWithTableSubQuery(masterPlan, + (TableSubQueryNode)leftChild, leftBlock, leftBlock, + rightUnion? null: rightBlock, true)); + } else { + queryBlockBlocks.add(leftBlock); + } } if (rightChild.getType() == NodeType.UNION) { unionBlocks.add(rightBlock); } else { queryBlockBlocks.add(rightBlock); + if (rightUnion) { + node.setRightChild( + createScanNodeWithTableSubQuery(masterPlan, + (TableSubQueryNode)rightChild, rightBlock, leftUnion? leftBlock: rightBlock, + leftUnion? null: leftBlock, false)); + } else { + queryBlockBlocks.add(rightBlock); + } } ExecutionBlock execBlock; if (unionBlocks.size() == 0) { execBlock = context.plan.newExecutionBlock(); + if (leftUnion || rightUnion) { + execBlock = (!leftUnion && rightUnion)? rightBlock: leftBlock; End diff – This line is no more necessary. I will upload changes for this patch.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/402#discussion_r26296284

        — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java —
        @@ -1371,32 +1373,57 @@ public LogicalNode visitUnion(GlobalPlanContext context, LogicalPlan plan, Logic
        LogicalPlan.QueryBlock rightQueryBlock = plan.getBlock(node.getRightChild());
        LogicalNode rightChild = visit(context, plan, rightQueryBlock, rightQueryBlock.getRoot(), stack);
        stack.pop();
        +
        + MasterPlan masterPlan = context.getPlan();

        List<ExecutionBlock> unionBlocks = Lists.newArrayList();
        List<ExecutionBlock> queryBlockBlocks = Lists.newArrayList();

        ExecutionBlock leftBlock = context.execBlockMap.remove(leftChild.getPID());
        ExecutionBlock rightBlock = context.execBlockMap.remove(rightChild.getPID());
        +
        + boolean leftUnion = (leftChild.getType() == NodeType.TABLE_SUBQUERY) &&
        + (((TableSubQueryNode)leftChild).getSubQuery().getType() == NodeType.UNION);
        + boolean rightUnion = (rightChild.getType() == NodeType.TABLE_SUBQUERY) &&
        + (((TableSubQueryNode)rightChild).getSubQuery().getType() == NodeType.UNION);
        if (leftChild.getType() == NodeType.UNION)

        { unionBlocks.add(leftBlock); }

        else {

        • queryBlockBlocks.add(leftBlock);
          + if (leftUnion) { + node.setLeftChild( + createScanNodeWithTableSubQuery(masterPlan, + (TableSubQueryNode)leftChild, leftBlock, leftBlock, + rightUnion? null: rightBlock, true)); + }

          else

          { + queryBlockBlocks.add(leftBlock); + }

          }
          if (rightChild.getType() == NodeType.UNION)

          { unionBlocks.add(rightBlock); }

          else {

        • queryBlockBlocks.add(rightBlock);
          + if (rightUnion) { + node.setRightChild( + createScanNodeWithTableSubQuery(masterPlan, + (TableSubQueryNode)rightChild, rightBlock, leftUnion? leftBlock: rightBlock, + leftUnion? null: leftBlock, false)); + }

          else

          { + queryBlockBlocks.add(rightBlock); + }

          }

        ExecutionBlock execBlock;
        if (unionBlocks.size() == 0) {

        • execBlock = context.plan.newExecutionBlock();
          + if (leftUnion || rightUnion) {
          + execBlock = (!leftUnion && rightUnion)? rightBlock: leftBlock;
            • End diff –

        Would you add some descriptions for this line?

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/402#discussion_r26296284 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java — @@ -1371,32 +1373,57 @@ public LogicalNode visitUnion(GlobalPlanContext context, LogicalPlan plan, Logic LogicalPlan.QueryBlock rightQueryBlock = plan.getBlock(node.getRightChild()); LogicalNode rightChild = visit(context, plan, rightQueryBlock, rightQueryBlock.getRoot(), stack); stack.pop(); + + MasterPlan masterPlan = context.getPlan(); List<ExecutionBlock> unionBlocks = Lists.newArrayList(); List<ExecutionBlock> queryBlockBlocks = Lists.newArrayList(); ExecutionBlock leftBlock = context.execBlockMap.remove(leftChild.getPID()); ExecutionBlock rightBlock = context.execBlockMap.remove(rightChild.getPID()); + + boolean leftUnion = (leftChild.getType() == NodeType.TABLE_SUBQUERY) && + (((TableSubQueryNode)leftChild).getSubQuery().getType() == NodeType.UNION); + boolean rightUnion = (rightChild.getType() == NodeType.TABLE_SUBQUERY) && + (((TableSubQueryNode)rightChild).getSubQuery().getType() == NodeType.UNION); if (leftChild.getType() == NodeType.UNION) { unionBlocks.add(leftBlock); } else { queryBlockBlocks.add(leftBlock); + if (leftUnion) { + node.setLeftChild( + createScanNodeWithTableSubQuery(masterPlan, + (TableSubQueryNode)leftChild, leftBlock, leftBlock, + rightUnion? null: rightBlock, true)); + } else { + queryBlockBlocks.add(leftBlock); + } } if (rightChild.getType() == NodeType.UNION) { unionBlocks.add(rightBlock); } else { queryBlockBlocks.add(rightBlock); + if (rightUnion) { + node.setRightChild( + createScanNodeWithTableSubQuery(masterPlan, + (TableSubQueryNode)rightChild, rightBlock, leftUnion? leftBlock: rightBlock, + leftUnion? null: leftBlock, false)); + } else { + queryBlockBlocks.add(rightBlock); + } } ExecutionBlock execBlock; if (unionBlocks.size() == 0) { execBlock = context.plan.newExecutionBlock(); + if (leftUnion || rightUnion) { + execBlock = (!leftUnion && rightUnion)? rightBlock: leftBlock; End diff – Would you add some descriptions for this line?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user ykrips commented on the pull request:

        https://github.com/apache/tajo/pull/402#issuecomment-78410182

        @jihoonson , Thank you for review on this patch. I will add some comments on this patch, and will upload patches in a short time.

        Show
        githubbot ASF GitHub Bot added a comment - Github user ykrips commented on the pull request: https://github.com/apache/tajo/pull/402#issuecomment-78410182 @jihoonson , Thank you for review on this patch. I will add some comments on this patch, and will upload patches in a short time.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user ykrips commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/402#discussion_r26272910

        — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java —
        @@ -1371,32 +1373,57 @@ public LogicalNode visitUnion(GlobalPlanContext context, LogicalPlan plan, Logic
        LogicalPlan.QueryBlock rightQueryBlock = plan.getBlock(node.getRightChild());
        LogicalNode rightChild = visit(context, plan, rightQueryBlock, rightQueryBlock.getRoot(), stack);
        stack.pop();
        +
        + MasterPlan masterPlan = context.getPlan();

        List<ExecutionBlock> unionBlocks = Lists.newArrayList();
        List<ExecutionBlock> queryBlockBlocks = Lists.newArrayList();

        ExecutionBlock leftBlock = context.execBlockMap.remove(leftChild.getPID());
        ExecutionBlock rightBlock = context.execBlockMap.remove(rightChild.getPID());
        +
        + boolean leftUnion = (leftChild.getType() == NodeType.TABLE_SUBQUERY) &&
        + (((TableSubQueryNode)leftChild).getSubQuery().getType() == NodeType.UNION);
        + boolean rightUnion = (rightChild.getType() == NodeType.TABLE_SUBQUERY) &&
        + (((TableSubQueryNode)rightChild).getSubQuery().getType() == NodeType.UNION);
        if (leftChild.getType() == NodeType.UNION)

        { unionBlocks.add(leftBlock); }

        else {

        • queryBlockBlocks.add(leftBlock);
          + if (leftUnion) {
          + node.setLeftChild(
          + createScanNodeWithTableSubQuery(masterPlan,
            • End diff –

        You're right. First of all, I eliminated these child subqueries, but plan threw exceptions that plan does not have sufficient data channels. I feel that it would be better to change the implementation. therefore, I have created these even unnecessary data operations exist.
        I agree with your thought, and I will try again for this optimization work.

        Show
        githubbot ASF GitHub Bot added a comment - Github user ykrips commented on a diff in the pull request: https://github.com/apache/tajo/pull/402#discussion_r26272910 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java — @@ -1371,32 +1373,57 @@ public LogicalNode visitUnion(GlobalPlanContext context, LogicalPlan plan, Logic LogicalPlan.QueryBlock rightQueryBlock = plan.getBlock(node.getRightChild()); LogicalNode rightChild = visit(context, plan, rightQueryBlock, rightQueryBlock.getRoot(), stack); stack.pop(); + + MasterPlan masterPlan = context.getPlan(); List<ExecutionBlock> unionBlocks = Lists.newArrayList(); List<ExecutionBlock> queryBlockBlocks = Lists.newArrayList(); ExecutionBlock leftBlock = context.execBlockMap.remove(leftChild.getPID()); ExecutionBlock rightBlock = context.execBlockMap.remove(rightChild.getPID()); + + boolean leftUnion = (leftChild.getType() == NodeType.TABLE_SUBQUERY) && + (((TableSubQueryNode)leftChild).getSubQuery().getType() == NodeType.UNION); + boolean rightUnion = (rightChild.getType() == NodeType.TABLE_SUBQUERY) && + (((TableSubQueryNode)rightChild).getSubQuery().getType() == NodeType.UNION); if (leftChild.getType() == NodeType.UNION) { unionBlocks.add(leftBlock); } else { queryBlockBlocks.add(leftBlock); + if (leftUnion) { + node.setLeftChild( + createScanNodeWithTableSubQuery(masterPlan, End diff – You're right. First of all, I eliminated these child subqueries, but plan threw exceptions that plan does not have sufficient data channels. I feel that it would be better to change the implementation. therefore, I have created these even unnecessary data operations exist. I agree with your thought, and I will try again for this optimization work.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/402#issuecomment-78277194

        @ykrips thanks for your patch. I left a comment.
        In addition, it is difficult to understand details of some changes.
        Would you mind telling me a brief summarization of the changes?
        Also, it would be great if you add some comments.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/402#issuecomment-78277194 @ykrips thanks for your patch. I left a comment. In addition, it is difficult to understand details of some changes. Would you mind telling me a brief summarization of the changes? Also, it would be great if you add some comments.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/402#discussion_r26217412

        — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java —
        @@ -1371,32 +1373,57 @@ public LogicalNode visitUnion(GlobalPlanContext context, LogicalPlan plan, Logic
        LogicalPlan.QueryBlock rightQueryBlock = plan.getBlock(node.getRightChild());
        LogicalNode rightChild = visit(context, plan, rightQueryBlock, rightQueryBlock.getRoot(), stack);
        stack.pop();
        +
        + MasterPlan masterPlan = context.getPlan();

        List<ExecutionBlock> unionBlocks = Lists.newArrayList();
        List<ExecutionBlock> queryBlockBlocks = Lists.newArrayList();

        ExecutionBlock leftBlock = context.execBlockMap.remove(leftChild.getPID());
        ExecutionBlock rightBlock = context.execBlockMap.remove(rightChild.getPID());
        +
        + boolean leftUnion = (leftChild.getType() == NodeType.TABLE_SUBQUERY) &&
        + (((TableSubQueryNode)leftChild).getSubQuery().getType() == NodeType.UNION);
        + boolean rightUnion = (rightChild.getType() == NodeType.TABLE_SUBQUERY) &&
        + (((TableSubQueryNode)rightChild).getSubQuery().getType() == NodeType.UNION);
        if (leftChild.getType() == NodeType.UNION)

        { unionBlocks.add(leftBlock); }

        else {

        • queryBlockBlocks.add(leftBlock);
          + if (leftUnion) {
          + node.setLeftChild(
          + createScanNodeWithTableSubQuery(masterPlan,
            • End diff –

        Please check whether I understand your patch well.
        When an union has a child of subquery that has another union child, the subquery and its child union should be removed. Let me suppose an example union ```u1```. It has a child subquery ```s1``` which has another union child, ```u2```. Finally, ```u2``` has a scan child, ```scan1```. In this case, ```u1``` should be directly connected with ```scan```.

        However, according to your patch, the example query forms (```scan1``` - ```u2``` - ```s1```) - (```scan2``` - ```u1```). Parentheses mean query blocks, and ```scan2``` is created at this line. In this case, I think that it is difficult to avoid unnecessary materialization of intermediate data between query blocks.

        If I misunderstand something, please tell me.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/402#discussion_r26217412 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java — @@ -1371,32 +1373,57 @@ public LogicalNode visitUnion(GlobalPlanContext context, LogicalPlan plan, Logic LogicalPlan.QueryBlock rightQueryBlock = plan.getBlock(node.getRightChild()); LogicalNode rightChild = visit(context, plan, rightQueryBlock, rightQueryBlock.getRoot(), stack); stack.pop(); + + MasterPlan masterPlan = context.getPlan(); List<ExecutionBlock> unionBlocks = Lists.newArrayList(); List<ExecutionBlock> queryBlockBlocks = Lists.newArrayList(); ExecutionBlock leftBlock = context.execBlockMap.remove(leftChild.getPID()); ExecutionBlock rightBlock = context.execBlockMap.remove(rightChild.getPID()); + + boolean leftUnion = (leftChild.getType() == NodeType.TABLE_SUBQUERY) && + (((TableSubQueryNode)leftChild).getSubQuery().getType() == NodeType.UNION); + boolean rightUnion = (rightChild.getType() == NodeType.TABLE_SUBQUERY) && + (((TableSubQueryNode)rightChild).getSubQuery().getType() == NodeType.UNION); if (leftChild.getType() == NodeType.UNION) { unionBlocks.add(leftBlock); } else { queryBlockBlocks.add(leftBlock); + if (leftUnion) { + node.setLeftChild( + createScanNodeWithTableSubQuery(masterPlan, End diff – Please check whether I understand your patch well. When an union has a child of subquery that has another union child, the subquery and its child union should be removed. Let me suppose an example union ```u1```. It has a child subquery ```s1``` which has another union child, ```u2```. Finally, ```u2``` has a scan child, ```scan1```. In this case, ```u1``` should be directly connected with ```scan```. However, according to your patch, the example query forms (```scan1``` - ```u2``` - ```s1```) - (```scan2``` - ```u1```). Parentheses mean query blocks, and ```scan2``` is created at this line. In this case, I think that it is difficult to avoid unnecessary materialization of intermediate data between query blocks. If I misunderstand something, please tell me.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/402#issuecomment-77857664

        I'll review tomorrow.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/402#issuecomment-77857664 I'll review tomorrow.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user ykrips opened a pull request:

        https://github.com/apache/tajo/pull/402

        TAJO-1368: Exceptions during processing nested union queries

        Please feel free to leave any comments on this patch.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/ykrips/tajo TAJO-1368

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/402.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #402


        commit dd4d1bbcb7048ad9c2b50daddcf01eeb6b4f647b
        Author: Jihun Kang <jihun@apache.org>
        Date: 2015-03-09T07:32:09Z

        Initial work

        commit 20a2e6b7a0bc97c9be3568aef82cc23aa6d3ecb1
        Author: Jihun Kang <jihun@apache.org>
        Date: 2015-03-09T07:34:19Z

        Fixed unused import

        commit 182983e2af9a2d1c1381193432c0b0e880f8eefd
        Author: Jihun Kang <jihun@apache.org>
        Date: 2015-03-09T07:47:12Z

        Fixed invalid code


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user ykrips opened a pull request: https://github.com/apache/tajo/pull/402 TAJO-1368 : Exceptions during processing nested union queries Please feel free to leave any comments on this patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ykrips/tajo TAJO-1368 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/402.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #402 commit dd4d1bbcb7048ad9c2b50daddcf01eeb6b4f647b Author: Jihun Kang <jihun@apache.org> Date: 2015-03-09T07:32:09Z Initial work commit 20a2e6b7a0bc97c9be3568aef82cc23aa6d3ecb1 Author: Jihun Kang <jihun@apache.org> Date: 2015-03-09T07:34:19Z Fixed unused import commit 182983e2af9a2d1c1381193432c0b0e880f8eefd Author: Jihun Kang <jihun@apache.org> Date: 2015-03-09T07:47:12Z Fixed invalid code
        Hide
        ykrips Jihun Kang added a comment -

        Jihoon Son, thanks for clarifying this issue.

        Show
        ykrips Jihun Kang added a comment - Jihoon Son , thanks for clarifying this issue.
        Hide
        jihoonson Jihoon Son added a comment -

        I changed the title because it can make readers confused.
        Thanks Jihun Kang!

        Show
        jihoonson Jihoon Son added a comment - I changed the title because it can make readers confused. Thanks Jihun Kang !
        Hide
        jihoonson Jihoon Son added a comment -

        You are right. In the second case, NPE is caused by the previous IllegalArgumentException.

        On both cases, I suspect that the problem is in Tajo's global planning.
        That is, even though union operators don't have to be shuffled, it seems that the global plan involves some shuffle operators, which incurs a NPE while creating shuffle information, i.e., data channel (case 1), and an IAE due to the invalid file system (case 2).

        Show
        jihoonson Jihoon Son added a comment - You are right. In the second case, NPE is caused by the previous IllegalArgumentException. On both cases, I suspect that the problem is in Tajo's global planning. That is, even though union operators don't have to be shuffled, it seems that the global plan involves some shuffle operators, which incurs a NPE while creating shuffle information, i.e., data channel (case 1), and an IAE due to the invalid file system (case 2).
        Hide
        ykrips Jihun Kang added a comment -

        Before we start this issue, we need to discuss on case 2. Default shuffle file format is Raw file and raw files are only stored on local file system, but worker tries to store raw files on hdfs file system. These things could be resolved.

        Show
        ykrips Jihun Kang added a comment - Before we start this issue, we need to discuss on case 2. Default shuffle file format is Raw file and raw files are only stored on local file system, but worker tries to store raw files on hdfs file system. These things could be resolved.

          People

          • Assignee:
            ykrips Jihun Kang
            Reporter:
            jihoonson Jihoon Son
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development