Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Currently SparkContext.getCallSite() makes a call to Utils.getCallSite().
private[spark] def getCallSite(): CallSite = { val callSite = Utils.getCallSite() CallSite( Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm), Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm) ) }
However, in some places utils.withDummyCallSite(sc) is invoked to avoid expensive threaddumps within getCallSite(). But Utils.getCallSite() is evaluated earlier causing threaddumps to be computed. This would impact when lots of RDDs are created (e.g spends close to 3-7 seconds when 1000+ are RDDs are present, which can have significant impact when entire query runtime is in the order of 10-20 seconds)
Creating this jira to consider evaluating getCallSite only when needed.