Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.3.1
-
None
-
None
Description
I noticed that task metrics for completed tasks with a stage failure do not show up in the new history server. I have a feeling this is because all of the tasks succeeded after the stage had been failed (so they were completions from a "zombie" taskset). The task metrics (eg. the shuffle read size & shuffle write size) do not show up at all, either in the task table, the executor table, or the overall stage summary metrics. (they might not show up in the job summary page either, but in the event logs I have, there is another successful stage attempt after this one, and that is the only thing which shows up in the jobs page.) If you get task details from the api endpoint (eg. http://[host]:[port]/api/v1/applications/[app-id]/stages/[stage-id]/[stage-attempt]) then you can see the successful tasks and all the metrics
Unfortunately the event logs I have are huge and I don't have a small repro handy, but I hope that description is enough to go on.
I loaded the event logs I have in the SHS from spark 2.2 and they appear fine.
Attachments
Issue Links
- duplicates
-
SPARK-24415 Stage page aggregated executor metrics wrong when failures
- Resolved