Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I've monitored Vortex Driver process using the Vortex AddOne example with DriverMessageHandler#onNext in VortexWorker commented out to never let the job finish. I ran it on my macbook using the local runtime.
In the attached jconsole screenshot, we start with 437 threads and more than 6% cpu utilization. After about 1 minute(Java ThreadPool keepAliveTime) later the number shrinks down to about 30 threads, which should be close to the number of threads used by default in REEF Driver. Then, using bash kill I killed two running Evaluators. The number spikes up to around 300 and goes back down to 30 threads again 1 minute later.
Who's responsible for creating all these extra threads? These extra threads were tagged as pool-1-threads-x(x being a number). They shouldn't be from Wake stages as stages always use StageName as a prefix. Only thread pool in Vortex that is not inside a Wake stage is the one in VortexRequestor. This thread pool serializes tasklets and send them over to Evaluators via RunningTask#send.
In conclusion, whenever we need to send or re-send due to failures 1,000 tasklets in AddOne, VortexRequestor creates hundreds of threads. I think the situation can get worse with more tasklets and bigger input data. We should fix this by either setting maximumPoolSize or improving the serialization performance as discussed in REEF-504.