Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15426 Spark 2.0 SQL API audit
  3. SPARK-15425

Disallow cartesian joins by default

    XMLWordPrintableJSON

    Details

    • Target Version/s:

      Description

      It is fairly easy for users to shoot themselves in the foot if they run cartesian joins. Often they might not even be aware of the join methods chosen. This happened to me a few times in the last few weeks.

      It would be a good idea to disable cartesian joins by default, and require explicit enabling of it via "crossJoin" method or in SQL "cross join". This however might be too large of a scope for 2.0 given the timing. As a small and quick fix, we can just have a single config option (spark.sql.join.enableCartesian) that controls this behavior. In the future we can implement the fine-grained control.

      Note that the error message should be friendly and say "Set spark.sql.join.enableCartesian to true to turn on cartesian joins."

        Attachments

          Activity

            People

            • Assignee:
              sameerag Sameer Agarwal
              Reporter:
              rxin Reynold Xin
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: