Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33906

SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.2.0
    • 3.1.0
    • Web UI
    • None

    Description

      How to reproduce it?

      In mac OS standalone mode, open a spark-shell and run

      $SPARK_HOME/bin/spark-shell --master spark://localhost:7077

      val x = sc.makeRDD(1 to 100000, 5)
      x.count()
      

      Then open the app UI in the browser, and click the Executors page, will get stuck at this page: 

      Also the return JSON of REST API endpoint http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors miss "peakMemoryMetrics" for executors.

      [ {
        "id" : "driver",
        "hostPort" : "192.168.1.241:50042",
        "isActive" : true,
        "rddBlocks" : 0,
        "memoryUsed" : 0,
        "diskUsed" : 0,
        "totalCores" : 0,
        "maxTasks" : 0,
        "activeTasks" : 0,
        "failedTasks" : 0,
        "completedTasks" : 0,
        "totalTasks" : 0,
        "totalDuration" : 0,
        "totalGCTime" : 0,
        "totalInputBytes" : 0,
        "totalShuffleRead" : 0,
        "totalShuffleWrite" : 0,
        "isBlacklisted" : false,
        "maxMemory" : 455501414,
        "addTime" : "2020-12-24T19:44:18.033GMT",
        "executorLogs" : { },
        "memoryMetrics" : {
          "usedOnHeapStorageMemory" : 0,
          "usedOffHeapStorageMemory" : 0,
          "totalOnHeapStorageMemory" : 455501414,
          "totalOffHeapStorageMemory" : 0
        },
        "blacklistedInStages" : [ ],
        "peakMemoryMetrics" : {
          "JVMHeapMemory" : 135021152,
          "JVMOffHeapMemory" : 149558576,
          "OnHeapExecutionMemory" : 0,
          "OffHeapExecutionMemory" : 0,
          "OnHeapStorageMemory" : 3301,
          "OffHeapStorageMemory" : 0,
          "OnHeapUnifiedMemory" : 3301,
          "OffHeapUnifiedMemory" : 0,
          "DirectPoolMemory" : 67963178,
          "MappedPoolMemory" : 0,
          "ProcessTreeJVMVMemory" : 0,
          "ProcessTreeJVMRSSMemory" : 0,
          "ProcessTreePythonVMemory" : 0,
          "ProcessTreePythonRSSMemory" : 0,
          "ProcessTreeOtherVMemory" : 0,
          "ProcessTreeOtherRSSMemory" : 0,
          "MinorGCCount" : 15,
          "MinorGCTime" : 101,
          "MajorGCCount" : 0,
          "MajorGCTime" : 0
        },
        "attributes" : { },
        "resources" : { },
        "resourceProfileId" : 0,
        "isExcluded" : false,
        "excludedInStages" : [ ]
      }, {
        "id" : "0",
        "hostPort" : "192.168.1.241:50054",
        "isActive" : true,
        "rddBlocks" : 0,
        "memoryUsed" : 0,
        "diskUsed" : 0,
        "totalCores" : 12,
        "maxTasks" : 12,
        "activeTasks" : 0,
        "failedTasks" : 0,
        "completedTasks" : 5,
        "totalTasks" : 5,
        "totalDuration" : 2107,
        "totalGCTime" : 25,
        "totalInputBytes" : 0,
        "totalShuffleRead" : 0,
        "totalShuffleWrite" : 0,
        "isBlacklisted" : false,
        "maxMemory" : 455501414,
        "addTime" : "2020-12-24T19:44:20.335GMT",
        "executorLogs" : {
          "stdout" : "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stdout",
          "stderr" : "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003&executorId=0&logType=stderr"
        },
        "memoryMetrics" : {
          "usedOnHeapStorageMemory" : 0,
          "usedOffHeapStorageMemory" : 0,
          "totalOnHeapStorageMemory" : 455501414,
          "totalOffHeapStorageMemory" : 0
        },
        "blacklistedInStages" : [ ],
        "attributes" : { },
        "resources" : { },
        "resourceProfileId" : 0,
        "isExcluded" : false,
        "excludedInStages" : [ ]
      } ]
      

      I debugged it and observed that ExecutorMetricsPoller
      .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to None in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345. The possible reason for returning the empty map is that the stage completion time is shorter than the heartbeat interval, so the stage entry in stageTCMP has already been removed before the reportHeartbeat is called.

      How to fix it?

      Check if the peakMemoryMetrics is undefined in executorspage.js.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Baohe Zhang Baohe Zhang Assign to me
            Baohe Zhang Baohe Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment