Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2638

Improve concurrency of fetching Map outputs

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Environment:

      All

    • Target Version/s:

      Description

      This issue was noticed while perusing the MapOutputTracker source code. Notice that the synchronization is on the containing "fetching" collection - which makes ALL fetches wait if any fetch were occurring.

      The fix is to synchronize instead on the shuffleId (interned as a string to ensure JVM wide visibility).

      def getServerStatuses(shuffleId: Int, reduceId: Int): Array[(BlockManagerId, Long)] = {
      val statuses = mapStatuses.get(shuffleId).orNull
      if (statuses == null) {
      logInfo("Don't have map outputs for shuffle " + shuffleId + ", fetching them")
      var fetchedStatuses: Array[MapStatus] = null
      fetching.synchronized { // This is existing code
      // shuffleId.toString.intern.synchronized { // New Code
      if (fetching.contains(shuffleId)) {
      // Someone else is fetching it; wait for them to be done
      while (fetching.contains(shuffleId)) {
      try

      { fetching.wait() }

      catch

      { case e: InterruptedException => }

      }

      This is only a small code change, but the testcases to prove (a) proper functionality and (b) proper performance improvement are not so trivial.

      For (b) it is not worthwhile to add a testcase to the codebase. Instead I have added a git project that demonstrates the concurrency/performance improvement using the fine-grained approach . The github project is at

      https://github.com/javadba/scalatesting.git . Simply run "sbt test". Note: it is unclear how/where to include this ancillary testing/verification information that will not be included in the git PR: i am open for any suggestions - even as far as simply removing references to it.

        Attachments

          Activity

            People

            • Assignee:
              joshrosen Josh Rosen
              Reporter:
              javadba Stephen Boesch
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: