Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-786

Implement CROSS JOIN

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      For documentation:

      Due to it's nature cross joins can produce extremely large results, and we don't recommend to use the feature if you don't know that results won't cause out of memory errors. That's why cross joins are disabled by default, to allow explicit cross join syntax you'll have to enable it by setting planner.enable_nljoin_for_scalar_only option to false. There is also another limitation related to usage of aggregation function over cross join relation. When input row count for aggregate function is bigger than value of planner.slice_target option then query can't be planned (because 2 phase aggregation can't be created in such case), as a workaround you should set planner.enable_multiphase_agg to false. This limitation will be active until fix of https://issues.apache.org/jira/browse/DRILL-6839.

      ---------------------------------------------------------------------------------------------------------------------
      git.commit.id.abbrev=5d7e3d3

      0: jdbc:drill:schema=dfs> select student.name, student.age, student.studentnum from student cross join voter where student.age = 20 and voter.age = 20;
      Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"

      Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
      Original rel:
      AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): rowcount = 22500.0, cumulative cost =

      {inf}, id = 320
      DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id = 316
      DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
      DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
      DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
      DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
      DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
      DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}, id = 140

      Stack trace:
      org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; planner state:

      Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
      Original rel:
      AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): rowcount = 22500.0, cumulative cost = {inf}

      , id = 320
      DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 22500.0, cumulative cost =

      {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}

      , id = 316
      DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost =

      {22500.0 rows, 12.0 cpu, 0.0 io, 0.0 network}

      , id = 314
      DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost =

      {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}

      , id = 312
      DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost =

      {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
      DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}

      , id = 129
      DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost =

      {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
      DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}, id = 140

      Sets:
      Set#22, type: (DrillRecordRow[*, age, name, studentnum])
      rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, importance=0.5904900000000001
      rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}

      rel#333:AbstractConverter.LOGICAL.ANY([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost=

      {inf}
      rel#337:AbstractConverter.LOGICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}

      rel#332:Subset#22.PHYSICAL.ANY([]).[], best=rel#335, importance=0.531441
      rel#334:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost=

      {inf}
      rel#338:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}

      rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost=

      {inf}
      rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}

      rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, cumulative cost=

      {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}
      rel#336:Subset#22.PHYSICAL.SINGLETON([]).[], best=rel#335, importance=0.4782969000000001
      rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
      rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
      rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}

      Set#23, type: (DrillRecordRow[*, age, name, studentnum])
      rel#308:Subset#23.LOGICAL.ANY([]).[], best=rel#307, importance=0.6561
      rel#307:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER, 20)), rowcount=150.0, cumulative cost=

      {2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 network}
      rel#343:AbstractConverter.LOGICAL.ANY([]).[](child=rel#342:Subset#23.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=150.0, cumulative cost={inf}
      rel#342:Subset#23.PHYSICAL.SINGLETON([]).[], best=rel#341, importance=0.5904900000000001
      rel#344:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#308:Subset#23.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=150.0, cumulative cost={inf}
      rel#341:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER, 20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 network}

      Set#24, type: (DrillRecordRow[*, age])
      rel#309:Subset#24.LOGICAL.ANY([]).[], best=rel#140, importance=0.5904900000000001
      rel#140:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, voter]), rowcount=1000.0, cumulative cost=

      {1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}
      rel#330:AbstractConverter.LOGICAL.ANY([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
      rel#349:AbstractConverter.LOGICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
      rel#329:Subset#24.PHYSICAL.ANY([]).[], best=rel#347, importance=0.531441
      rel#331:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
      rel#350:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
      rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
      rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}
      rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}

      rel#348:Subset#24.PHYSICAL.SINGLETON([]).[], best=rel#347, importance=0.4782969000000001
      rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost=

      {inf}
      rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=1000.0, cumulative cost={inf}

      rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), rowcount=1000.0, cumulative cost=

      {1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}

      Set#25, type: (DrillRecordRow[*, age])
      rel#311:Subset#25.LOGICAL.ANY([]).[], best=rel#310, importance=0.6561
      rel#310:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER, 20)), rowcount=150.0, cumulative cost=

      {2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 network}
      rel#355:AbstractConverter.LOGICAL.ANY([]).[](child=rel#354:Subset#25.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=150.0, cumulative cost={inf}
      rel#354:Subset#25.PHYSICAL.SINGLETON([]).[], best=rel#353, importance=0.5904900000000001
      rel#356:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#311:Subset#25.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=150.0, cumulative cost={inf}
      rel#353:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER, 20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 network}

      Set#26, type: RecordType(ANY *, ANY age, ANY name, ANY studentnum, ANY *0, ANY age0)
      rel#313:Subset#26.LOGICAL.ANY([]).[], best=rel#312, importance=0.7290000000000001
      rel#312:DrillJoinRel.LOGICAL.ANY([]).[](left=rel#308:Subset#23.LOGICAL.ANY([]).[],right=rel#311:Subset#25.LOGICAL.ANY([]).[],condition=true,joinType=inner), rowcount=22500.0, cumulative cost=

      {4001.0 rows, 14001.0 cpu, 0.0 io, 0.0 network}

      rel#327:AbstractConverter.LOGICAL.ANY([]).[](child=rel#326:Subset#26.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.7976931348623157E308, cumulative cost=

      {inf}
      rel#326:Subset#26.PHYSICAL.ANY([]).[], best=null, importance=0.6561
      rel#328:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=22500.0, cumulative cost={inf}

      Set#27, type: RecordType(ANY name, ANY age, ANY studentnum)
      rel#315:Subset#27.LOGICAL.ANY([]).[], best=rel#314, importance=0.81
      rel#314:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],name=$2,age=$1,studentnum=$3), rowcount=22500.0, cumulative cost=

      {26501.0 rows, 14013.0 cpu, 0.0 io, 0.0 network}

      rel#322:AbstractConverter.LOGICAL.ANY([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.7976931348623157E308, cumulative cost=

      {inf}
      rel#321:Subset#27.PHYSICAL.SINGLETON([]).[], best=null, importance=0.7290000000000001
      rel#323:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=22500.0, cumulative cost={inf}

      Set#28, type: RecordType(ANY name, ANY age, ANY studentnum)
      rel#317:Subset#28.LOGICAL.ANY([]).[], best=rel#316, importance=0.9
      rel#316:DrillScreenRel.LOGICAL.ANY([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[]), rowcount=22500.0, cumulative cost=

      {28751.0 rows, 16263.0 cpu, 0.0 io, 0.0 network}

      rel#319:AbstractConverter.LOGICAL.ANY([]).[](child=rel#318:Subset#28.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]), rowcount=1.7976931348623157E308, cumulative cost=

      {inf}
      rel#318:Subset#28.PHYSICAL.SINGLETON([]).[], best=null, importance=1.0
      rel#320:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#317:Subset#28.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]), rowcount=22500.0, cumulative cost={inf}

      rel#324:ScreenPrel.PHYSICAL.SINGLETON([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[]), rowcount=1.7976931348623157E308, cumulative cost=

      {inf}

      org.eigenbase.relopt.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:445) ~[optiq-core-0.7-20140513.013236-5.jar:na]
      org.eigenbase.relopt.volcano.RelSubset.buildCheapestPlan(RelSubset.java:287) ~[optiq-core-0.7-20140513.013236-5.jar:na]
      org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:669) ~[optiq-core-0.7-20140513.013236-5.jar:na]
      net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:271) ~[optiq-core-0.7-20140513.013236-5.jar:na]
      org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:119) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
      org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:89) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
      org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:134) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
      org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:338) [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
      org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:186) [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
      java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ihuzenko Igor Guzenko
            knguyen Krystal
            Vova Vysotskyi Vova Vysotskyi
            Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment