Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9850

Adaptive execution in Spark

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Epic
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Spark Core, SQL
    • Labels:
      None

      Description

      Query planning is one of the main factors in high performance, but the current Spark engine requires the execution DAG for a job to be set in advance. Even with cost­-based optimization, it is hard to know the behavior of data and user-defined functions well enough to always get great execution plans. This JIRA proposes to add adaptive query execution, so that the engine can change the plan for each query as it sees what data earlier stages produced.

      We propose adding this to Spark SQL / DataFrames first, using a new API in the Spark engine that lets libraries run DAGs adaptively. In future JIRAs, the functionality could be extended to other libraries or the RDD API, but that is more difficult than adding it in SQL.

      I've attached a design doc by Yin Huai and myself explaining how it would work in more detail.

        Attachments

        Issue Links

        There are no Sub-Tasks for this issue.
        There are no issues in this epic.

          Activity

            People

            • Assignee:
              yhuai Yin Huai
              Reporter:
              matei Matei Alexandru Zaharia

              Dates

              • Created:
                Updated:

                Issue deployment