Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30586

NPE in LiveRDDDistribution (AppStatusListener)

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.4
    • None
    • Spark Core
    • A Hadoop cluster consisting of Centos 7.4 machines.

    Description

      We've been noticing a great amount of NullPointerExceptions in our long-running Spark job driver logs:

      20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an exception
      java.lang.NullPointerException
              at org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
              at org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507)
              at org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85)
              at org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603)
              at org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486)
              at org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
              at org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
              at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
              at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
              at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
              at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
              at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
              at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
              at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
              at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
              at scala.collection.AbstractTraversable.map(Traversable.scala:104)
              at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548)
              at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49)
              at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991)
              at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997)
              at org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
              at org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
              at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
              at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
              at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
              at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
              at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
              at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788)
              at org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764)
              at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59)
              at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
              at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
              at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
              at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
              at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
              at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
              at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
              at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
              at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
              at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
              at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
              at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
      

      Symptoms of a Spark app that made us investigate the logs in the first place include:

      • slower execution of submitted jobs
      • jobs remaining "Active Jobs" in the Spark UI even though they should have completed days ago
      • these jobs could not be killed from the Spark UI (the page refreshes but the jobs remained there)
      • stages for these jobs could not be examined in the Spark UI because it returned an error instead.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            bossie Jan Van den bosch
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment