Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14091

Improve performance of SparkContext.getCallSite()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • Spark Core
    • None

    Description

      Currently SparkContext.getCallSite() makes a call to Utils.getCallSite().

        private[spark] def getCallSite(): CallSite = {
          val callSite = Utils.getCallSite()
          CallSite(
            Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
            Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
          )
        }
      

      However, in some places utils.withDummyCallSite(sc) is invoked to avoid expensive threaddumps within getCallSite(). But Utils.getCallSite() is evaluated earlier causing threaddumps to be computed. This would impact when lots of RDDs are created (e.g spends close to 3-7 seconds when 1000+ are RDDs are present, which can have significant impact when entire query runtime is in the order of 10-20 seconds)

      Creating this jira to consider evaluating getCallSite only when needed.

      Attachments

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: