Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23237

Add UI / endpoint for threaddumps for executors with active tasks

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • Web UI

    Description

      Frequently, when there are a handful of straggler tasks, users want to know what is going on in those executors running the stragglers. Currently, that is a bit of a pain to do: you have to go to the page for your active stage, find the task, figure out which executor its on, then go to the executors page, and get the thread dump. Or maybe you just go to the executors page, find the executor with an active task, and then click on that, but that doesn't work if you've got multiple stages running.

      Users could figure this by extracting the info from the stage rest endpoint, but it's such a common thing to do that we should make it easy.

      I realize that figuring out a good way to do this is a little tricky. We don't want to make it easy to end up pulling thread dumps from 1000 executors back to the driver. So we've got to come up with a reasonable heuristic for choosing which executors to poll. And we've also got to find a suitable place to put this.

      My suggestion is that the stage page always has a link to the thread dumps for the one executor with the longest running task. And there would be a corresponding endpoint in the rest api with the same info, maybe at /applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/slowestTaskThreadDump.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              irashid Imran Rashid
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: