Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18390

Optimized plan tried to use Cartesian join when it is not enabled

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.0.1
    • None
    • SQL
    • None

    Description

      val df2 = spark.range(1e9.toInt).withColumn("one", lit(1))
      val df3 = spark.range(1e9.toInt)
      df3.join(df2, df3("id") === df2("one")).count()
      

      throws

      org.apache.spark.sql.AnalysisException: Cartesian joins could be prohibitively expensive and are disabled by default. To explicitly enable them, please set spark.sql.crossJoin.enabled = true;

      This is probably not the right behavior because it was not the user who suggested using cartesian product. SQL picked it while knowing it is not enabled.

      Attachments

        Issue Links

          Activity

            People

              vssrinath Srinath
              mengxr Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: