Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2393

Simple cost estimation and auto selection of broadcast join

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      Spark SQL should support the common optimization known as cost estimations. For example, each logical operator should be able to estimate its cardinality, based on the estimates from its children.

      As a first step, the framework to support doing so should be added, which might include the interface for the aforementioned cardinality estimation, and/or some other metrics.

      As the first proof of concept usage of this optimization, a simple optimization strategy for certain equi-joins should be added: namely, if one side of a qualifying join has a small estimated physical size (smaller than some threshold), the planner should use a broadcast join physical plan to execute the join, broadcasting the small side and streaming through the bigger side. Similar concept exists in Hive and is explained here.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ConcreteVitamin Zongheng Yang
                Reporter:
                marmbrus Michael Armbrust
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: