Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30586

NPE in LiveRDDDistribution (AppStatusListener)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.4
    • None
    • Spark Core
    • A Hadoop cluster consisting of Centos 7.4 machines.

    Description

      We've been noticing a great amount of NullPointerExceptions in our long-running Spark job driver logs:

      20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an exception
      java.lang.NullPointerException
              at org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
              at org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507)
              at org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85)
              at org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603)
              at org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486)
              at org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
              at org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
              at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
              at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
              at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
              at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
              at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
              at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
              at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
              at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
              at scala.collection.AbstractTraversable.map(Traversable.scala:104)
              at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548)
              at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49)
              at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991)
              at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997)
              at org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
              at org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
              at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
              at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
              at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
              at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
              at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
              at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788)
              at org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764)
              at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59)
              at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
              at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
              at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
              at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
              at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
              at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
              at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
              at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
              at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
              at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
              at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
              at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
      

      Symptoms of a Spark app that made us investigate the logs in the first place include:

      • slower execution of submitted jobs
      • jobs remaining "Active Jobs" in the Spark UI even though they should have completed days ago
      • these jobs could not be killed from the Spark UI (the page refreshes but the jobs remained there)
      • stages for these jobs could not be examined in the Spark UI because it returned an error instead.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bossie Jan Van den bosch
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: