Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9258

Remove all semi join physical operator

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • SQL
    • None

    Description

      We have 4 semi join operators. In this case, they are not very very necessary. We can still use an equi-join operator to do the join, and just not include any values from the other join.

      We waste a little bit space due to building a hash map rather than a hash set, but at the end of the day unless we are going to spend a lot of time optimizing hash set, our Tungsten hash map will be a lot more efficient than the hash set anyway. This way, semi-join automatically benefits from all the work we do in Tungsten.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: