Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37063 SQL Adaptive Query Execution QA: Phase 2
  3. SPARK-38401

Unify get preferred locations for shuffle in AQE

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • SQL
    • None

    Description

      It has several issues with method `ShuffledRowRDD#getPreferredLocations`.

      • it does not respect the config `spark.shuffle.reduceLocality.enabled`, so we can not disable it.
      • it does not respect `REDUCER_PREF_LOCS_FRACTION`, so it has no effect if DAG schedule task to an executor who has less data. In worse, driver will take more memory to store the useless locations.

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            ulysses XiDuo You

            Dates

              Created:
              Updated:

              Slack

                Issue deployment