[STORM-2853] Deactivated topologies cause high cpu utilization - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.1.0
Fix Version/s: 2.0.0, 1.2.0, 1.1.2, 1.0.6
Component/s: storm-core
Labels:
- pull-request-available

Description

The issue is there is high cpu usage for deactivated apache storm topologies. I can reliably re-create the issue using the steps below but I haven't identified the exact cause or a solution yet.

The environment is a storm cluster on which 1 topology is running (The topology is extremely simple, I used the exclamation example). It is INACTIVE. Initially there is normal CPU usage. However, when I kill all topology JVM processes on all supervisors and let Storm restart them again, I find that some time later (~9 hours) the CPU usage per JVM process rises to nearly 100%. I have tested an ACTIVE topology and this does not happen with it. I have also tested more than one topology and observe the same results when they're in the INACTIVE state.

**Steps to re-create:**

1. Run 1 topology on an Apache Storm cluster
2. Deactivate it
3. Kill *all* topology JVM processes on all supervisors (Storm will restart them)
4. Observe the CPU usage on Supervisors rise to nearly 100% for all *INACTIVE* topology JVM processes.

**Environment**

Apache Storm 1.1.0 running on 3 VMs (1 nimbus and 2 supervisors).

Cluster Summary:

Supervisors: 2
Used Slots: 2
Available Slots: 38
Total Slots: 40
Executors: 50
Tasks: 50

the topology has 2 workers and 50 executors/tasks (threads).

**Investigation so far:**

Apart from being able to reliably re-create the issue, I have identified, for the affected topology JVM process, the threads using the most CPU. There are 102 threads total in the process, 97 blocked, 5 IN_NATIVE. The threads using the most CPU are identical and there are 23 of them (all in BLOCKED state):

Thread 28558: (state = BLOCKED)

sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
java.util.concurrent.locks.LockSupport.parkNanos(long) @bci=11, line=338 (Compiled frame)
com.lmax.disruptor.MultiProducerSequencer.next(int) @bci=82, line=136 (Compiled frame)
com.lmax.disruptor.RingBuffer.next(int) @bci=5, line=260 (Interpreted frame)
org.apache.storm.utils.DisruptorQueue.publishDirect(java.util.ArrayList, boolean) @bci=18, line=517 (Interpreted frame)
org.apache.storm.utils.DisruptorQueue.access$1000(org.apache.storm.utils.DisruptorQueue, java.util.ArrayList, boolean) @bci=3, line=61 (Interpreted frame)
org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher.flush(boolean) @bci=50, line=280 (Interpreted frame)
org.apache.storm.utils.DisruptorQueue$Flusher.run() @bci=55, line=303 (Interpreted frame)
java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1142 (Compiled frame)
java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

I identified this thread by using `jstack` to get a thread dump for the process:

jstack -F <pid> > jstack<pid>.txt

and `top` to identify the threads within the process using the most CPU:

top -H -p <pid>

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

exclamation.zip
14/Dec/17 10:49
5 kB
Stuart

Issue Links

links to

GitHub Pull Request #2542

GitHub Pull Request #2543

Activity

People

Assignee:: Jungtaek Lim

Reporter:: Stuart

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Dec/17 12:17

Updated:: 12/Feb/18 10:46

Resolved:: 02/Feb/18 04:21

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 20m