Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
1.6.0
-
None
-
None
Description
I am running a cross join with a distinct select on both sides. Table is tiny (32 X 16). Running query through the thriftserver. Data is here (https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv). Query never terminates before 200s, mostly fails (TTransportException) while all cores are being used (single machine).
Query is
SELECT `gwtlyrpywf`.`gear`,`gwtlyrpywf`.`cyl`,`vs` FROM ( SELECT DISTINCT * FROM ( SELECT `gear` AS `gear`, `cyl` AS `cyl`FROM `mtcars`) AS `zzz1`) AS `gwtlyrpywf` CROSS JOIN ( SELECT DISTINCT * FROM ( SELECT `vs` AS `vs` FROM `mtcars`) AS `zzz3`) AS `arytvfispy`
I know it can be simplified, but it comes from a generator and the generator counts on the optimizer to do the right thing. EXPLAIN shows the following
plan 1 == Physical Plan == 2 Project [gear#21,cyl#22,vs#23] 3 +- CartesianProduct 4 :- ConvertToSafe 5 : +- TungstenAggregate(key=[gear#21,cyl#22], functions=[], output=[gear#21,cyl#22]) 6 : +- TungstenExchange hashpartitioning(gear#21,cyl#22,200), None 7 : +- TungstenAggregate(key=[gear#21,cyl#22], functions=[], output=[gear#21,cyl#22]) 8 : +- Project [gear#17 AS gear#21,cyl#9 AS cyl#22] 9 : +- Scan CsvRelation(<function0>,Some(/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//RtmpeDwNvS/file168c154ef10e),true,,,",null,#,FAILFAST,commons,false,false,false,StructType(StructField(mpg,DoubleType,true), StructField(cyl,DoubleType,true), StructField(disp,DoubleType,true), StructField(hp,DoubleType,true), StructField(drat,DoubleType,true), StructField(wt,DoubleType,true), StructField(qsec,DoubleType,true), StructField(vs,DoubleType,true), StructField(am,DoubleType,true), StructField(gear,DoubleType,true), StructField(carb,DoubleType,true)),true,null)[gear#17,cyl#9] 10 +- ConvertToSafe 11 +- TungstenAggregate(key=[vs#23], functions=[], output=[vs#23]) 12 +- TungstenExchange hashpartitioning(vs#23,200), None 13 +- TungstenAggregate(key=[vs#23], functions=[], output=[vs#23]) 14 +- Project [vs#15 AS vs#23] 15 +- Scan CsvRelation(<function0>,Some(/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//RtmpeDwNvS/file168c154ef10e),true,,,",null,#,FAILFAST,commons,false,false,false,StructType(StructField(mpg,DoubleType,true), StructField(cyl,DoubleType,true), StructField(disp,DoubleType,true), StructField(hp,DoubleType,true), StructField(drat,DoubleType,true), StructField(wt,DoubleType,true), StructField(qsec,DoubleType,true), StructField(vs,DoubleType,true), StructField(am,DoubleType,true), StructField(gear,DoubleType,true), StructField(carb,DoubleType,true)),true,null)[vs#15]
Thanks