Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44109

Remove duplicate preferred locations of each RDD partition

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0
    • 3.5.0
    • Spark Core
    • None

    Description

      DAGScheduler will get the preferred locations for each RDD partition and try to allocate the task on the preferred locations.

      We can remove the duplicate preferred locations to save memory.

      For example. reduce 0 needs to fetch map0 output and map1 output in host-A, then the preferred locations can be Array("host-A").

      Attachments

        Activity

          People

            wankun Wan Kun
            wankun Wan Kun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: