Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6166

Limit number of in flight outbound requests for shuffle fetch

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.4.0
    • 2.0.0
    • Spark Core
    • None

    Description

      spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of size.
      But this is not always sufficient : when the number of hosts in the cluster increase, this can lead to very large number of in-bound connections to one more nodes - causing workers to fail under the load.

      I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on number of outstanding outbound requests.
      This might still cause hotspots in the cluster, but in our tests this has significantly reduced the occurance of worker failures.

      Attachments

        Activity

          People

            sanket991 Sanket Reddy
            mridulm80 Mridul Muralidharan
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: