[SPARK-21303] Web-UI shows some Jobs get stuck randomly and stays like that. Neither able to kill - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 2.1.0, 2.1.1
Fix Version/s: None
Component/s: DStreams
Labels:
None
Environment:

Kubernetes 1.4.12 on AWS
OS Ubuntu
Spark 2.1.1
Cassandra 3.9

Description

We are running a streaming application which was running without any issues for long. Last few days we are seeing some jobs randomly getting stuck on the web ui. This doesn't stop the application as the following jobs are successful. The stuck jobs remain in the web-ui as stuck with no progress. These are the observations we made. At the time the first job is shown stuck on UI the driver logs mention this

2017-07-04 05:33:20,189 ERROR [dag-scheduler-event-loop] org.apache.spark.scheduler.LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.

For every other random stuck job the driver logs mention the below at the same time

2017-07-04 05:33:20,194 WARN [dispatcher-event-loop-0] org.apache.spark.scheduler.LiveListenerBus: Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970

2017-07-04 05:49:31,571 WARN [dag-scheduler-event-loop] org.apache.spark.scheduler.LiveListenerBus: Dropped 1 SparkListenerEvents since Tue Jul 04 05:34:20 UTC 2017

After the jobs starts getting stuck we are experiencing performance drops as well as scheduling delays within the application. We couldn't find any other significant errors in the driver logs.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Streaming-2017-07-11 at 6.51.14 PM.png
12/Jul/17 03:19
177 kB
Arun Achuthan
Persist Incoming Event Streams - Thread dump for executor 4.html
12/Jul/17 03:19
44 kB
Arun Achuthan
Persist Incoming Event Streams - Thread dump for executor 3.html
12/Jul/17 03:19
37 kB
Arun Achuthan
Executors-2017-07-11 at 6.44.12 PM.png
12/Jul/17 03:19
222 kB
Arun Achuthan
Persist Incoming Event Streams - Thread dump for executor 2.html
12/Jul/17 03:19
49 kB
Arun Achuthan
Persist Incoming Event Streams - Thread dump for executor 1.html
12/Jul/17 03:19
65 kB
Arun Achuthan
Persist Incoming Event Streams - Thread dump for executor 0.html
12/Jul/17 03:19
55 kB
Arun Achuthan

Issue Links

is related to

SPARK-18838 High latency of event processing for large jobs

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Arun Achuthan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/Jul/17 11:59

Updated:: 12/Jul/17 15:37

Resolved:: 10/Jul/17 06:50