Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16475

Broadcast Hint for SQL Queries

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.0
    • Component/s: None
    • Labels:
    • Target Version/s:

      Description

      Broadcast hint is a way for users to manually annotate a query and suggest to the query optimizer the join method. It is very useful when the query optimizer cannot make optimal decision with respect to join methods due to conservativeness or the lack of proper statistics.

      The DataFrame API has broadcast hint since Spark 1.5. However, we do not have an equivalent functionality in SQL queries. We propose adding Hive-style broadcast hint to Spark SQL.

      For more information, please see the attached document. One note about the doc: in addition to supporting "MAPJOIN", we should also support "BROADCASTJOIN" and "BROADCAST" in the comment, e.g. the following should be accepted:

      SELECT /*+ MAPJOIN(b) */ ...
      
      SELECT /*+ BROADCASTJOIN(b) */ ...
      
      SELECT /*+ BROADCAST(b) */ ...
      

        Attachments

        1. BroadcastHintinSparkSQL.pdf
          111 kB
          Reynold Xin

          Issue Links

            Activity

              People

              • Assignee:
                rxin Reynold Xin
                Reporter:
                rxin Reynold Xin
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: