Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15426 Spark 2.0 SQL API audit
  3. SPARK-15425

Disallow cartesian joins by default

    XMLWordPrintableJSON

Details

    Description

      It is fairly easy for users to shoot themselves in the foot if they run cartesian joins. Often they might not even be aware of the join methods chosen. This happened to me a few times in the last few weeks.

      It would be a good idea to disable cartesian joins by default, and require explicit enabling of it via "crossJoin" method or in SQL "cross join". This however might be too large of a scope for 2.0 given the timing. As a small and quick fix, we can just have a single config option (spark.sql.join.enableCartesian) that controls this behavior. In the future we can implement the fine-grained control.

      Note that the error message should be friendly and say "Set spark.sql.join.enableCartesian to true to turn on cartesian joins."

      Attachments

        Activity

          People

            sameerag Sameer Agarwal
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: