Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13169

CROSS JOIN slow or fails on tiny table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • 1.6.0
    • None
    • SQL
    • None

    Description

      I am running a cross join with a distinct select on both sides. Table is tiny (32 X 16). Running query through the thriftserver. Data is here (https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv). Query never terminates before 200s, mostly fails (TTransportException) while all cores are being used (single machine).
      Query is

      SELECT `gwtlyrpywf`.`gear`,`gwtlyrpywf`.`cyl`,`vs` FROM (
                 SELECT DISTINCT * FROM (
                   SELECT `gear` AS `gear`, `cyl` AS `cyl`FROM `mtcars`) 
                    AS `zzz1`)
               AS `gwtlyrpywf`
             CROSS JOIN (
                 SELECT DISTINCT * FROM (
                   SELECT `vs` AS `vs` FROM `mtcars`)
                 AS `zzz3`)
             AS `arytvfispy`
      

      I know it can be simplified, but it comes from a generator and the generator counts on the optimizer to do the right thing. EXPLAIN shows the following

      plan
      1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    == Physical Plan ==
      2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Project [gear#21,cyl#22,vs#23]
      3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    +- CartesianProduct
      4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       :- ConvertToSafe
      5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   :  +- TungstenAggregate(key=[gear#21,cyl#22], functions=[], output=[gear#21,cyl#22])
      6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   :     +- TungstenExchange hashpartitioning(gear#21,cyl#22,200), None
      7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             :        +- TungstenAggregate(key=[gear#21,cyl#22], functions=[], output=[gear#21,cyl#22])
      8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            :           +- Project [gear#17 AS gear#21,cyl#9 AS cyl#22]
      9     :              +- Scan CsvRelation(<function0>,Some(/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//RtmpeDwNvS/file168c154ef10e),true,,,",null,#,FAILFAST,commons,false,false,false,StructType(StructField(mpg,DoubleType,true), StructField(cyl,DoubleType,true), StructField(disp,DoubleType,true), StructField(hp,DoubleType,true), StructField(drat,DoubleType,true), StructField(wt,DoubleType,true), StructField(qsec,DoubleType,true), StructField(vs,DoubleType,true), StructField(am,DoubleType,true), StructField(gear,DoubleType,true), StructField(carb,DoubleType,true)),true,null)[gear#17,cyl#9] 
      10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      +- ConvertToSafe
      11                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       +- TungstenAggregate(key=[vs#23], functions=[], output=[vs#23])
      12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 +- TungstenExchange hashpartitioning(vs#23,200), None
      13                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       +- TungstenAggregate(key=[vs#23], functions=[], output=[vs#23])
      14                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           +- Project [vs#15 AS vs#23]
      15                            +- Scan CsvRelation(<function0>,Some(/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//RtmpeDwNvS/file168c154ef10e),true,,,",null,#,FAILFAST,commons,false,false,false,StructType(StructField(mpg,DoubleType,true), StructField(cyl,DoubleType,true), StructField(disp,DoubleType,true), StructField(hp,DoubleType,true), StructField(drat,DoubleType,true), StructField(wt,DoubleType,true), StructField(qsec,DoubleType,true), StructField(vs,DoubleType,true), StructField(am,DoubleType,true), StructField(gear,DoubleType,true), StructField(carb,DoubleType,true)),true,null)[vs#15]
      

      Thanks

      Attachments

        Activity

          People

            Unassigned Unassigned
            piccolbo Antonio Piccolboni
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: