Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44307

Bloom filter is not added for left outer join if the left side table is smaller than broadcast threshold.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.1
    • None
    • Optimizer

    Description

      In case of left outer join, even if the left side table is small enough to be broadcasted, shuffle join is used. This is because of the property of the left outer join. If the left side is broadcasted in left outer join, the result generated will be wrong. But this is not taken care of in bloom filter. While injecting the bloom filter, if lest side is smaller than broadcast threshold, bloom filter is not added. It assumes that the left side will be broadcast and there is no need for a bloom filter. This causes bloom filter optimization to be missed in case of left outer join with small left side and huge right-side table.

      Attachments

        Activity

          People

            Unassigned Unassigned
            maheshk114 mahesh kumar behera
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: