Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19609

Broadcast joins should pushdown join constraints as Filter to the larger relation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.1.0
    • None
    • SQL

    Description

      For broadcast inner-joins, where the smaller relation is known to be small enough to materialize on a worker, the set of values for all join columns is known and fits in memory. Spark should translate these values into a Filter pushed down to the datasource. The common join condition of equality, i.e. lhs.a == rhs.a, can be written as an a in ... clause. An example of pushing such filters is already present in the form of IsNotNull filters via sameerag's work on SPARK-12957 subtasks.

      This optimization could even work when the smaller relation does not fit entirely in memory. This could be done by partitioning the smaller relation into N pieces, applying this predicate pushdown for each piece, and unioning the results.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ndimiduk Nick Dimiduk
              Votes:
              11 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: