Affects Version/s: 2.0.2
Fix Version/s: None
yarn, deploy-mode = client
We have a Spark application that process continuously a lot of incoming jobs. Several jobs are processed in parallel, on multiple threads.
During intensive workloads, at some point, we start to have hundreds of warnings like this :
Starting from that, the performance of the app plummet, most of Stages and Jobs never finish. On SparkUI, I can see figures like 13000 pending jobs.
I can't see clearly another related exception happening before. Maybe this one, but it concerns another listener :
This is very problematic for us, since it's hard to detect, and requires an app restart.
I confirm the sequence :
1- ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue
2- JobProgressListener losing track of job and stages.