Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.3.1, 1.4.0
-
None
Description
org.apache.spark.util.ClosureCleaner#clean method contains logic to determine if Spark is run in interpreter mode: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala#L120
While this behavior is indeed valuable in particular situations, in addition to this it causes concurrent submitter threads to be blocked on a native call to java.lang.Class#forName0 since it appears only 1 thread at a time can make the call.
This becomes a major issue when you have multiple threads concurrently submitting short-lived jobs. This is one of the patterns how we use Spark in production, and the number of parallel requests is expected to be quite high, up to a couple of thousand at a time.
A typical stacktrace of a blocked thread looks like:
http-bio-8091-exec-14 [BLOCKED] [DAEMON] java.lang.Class.forName0(String, boolean, ClassLoader, Class) Class.java (native) java.lang.Class.forName(String) Class.java:260 org.apache.spark.util.ClosureCleaner$.clean(Object, boolean) ClosureCleaner.scala:122 org.apache.spark.SparkContext.clean(Object, boolean) SparkContext.scala:1623 org.apache.spark.rdd.RDD.reduce(Function2) RDD.scala:883 org.apache.spark.rdd.RDD.takeOrdered(int, Ordering) RDD.scala:1240 org.apache.spark.api.java.JavaRDDLike$class.takeOrdered(JavaRDDLike, int, Comparator) JavaRDDLike.scala:586 org.apache.spark.api.java.AbstractJavaRDDLike.takeOrdered(int, Comparator) JavaRDDLike.scala:46 ...
Attachments
Attachments
Issue Links
- links to