Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1946

Submit stage after executors have been registered

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.1.0
    • Spark Core
    • None

    Description

      Because creating TaskSetManager and registering executors are asynchronous, if running job without enough executors, it will lead to some issues

      • early stages' tasks run without preferred locality.
      • the default parallelism in yarn is based on number of executors,
      • the number of intermediate files per node for shuffle (this can bring the node down btw)
      • and amount of memory consumed on a node for rdd MEMORY persisted data (making the job fail if disk is not specified : like some of the mllib algos ?)
      • and so on ...
        (thanks mridulm80 's comments )

      A simple solution is sleeping few seconds in application, so that executors have enough time to register.

      A better way is to make DAGScheduler submit stage after a few of executors have been registered by configuration properties.

      # submit stage only after successfully registered executors arrived the ratio, default value 0 in Standalone mode and 0.9 in Yarn mode
      spark.scheduler.minRegisteredRatio = 0.8

      # whatever registered number is arrived, submit stage after the maxRegisteredWaitingTime(millisecond), default value 10000
      spark.scheduler.maxRegisteredWaitingTime = 5000

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              li-zhihui Zhihui
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: