Details
Description
SQLListener may grow very large when Spark runs complex multi-stage requests. The listener tracks metrics for all stages in _stageIdToStageMetrics hash map. SQLListener has some means to cleanup this hash map regularly, but this is not enough. Precisely, the method trimExecutionsIfNecessary ensures that _stageIdToStageMetrics does not have metrics for very old data; this method runs on each execution completion.
However, if an execution has many stages, SQLListener keeps adding new entries to _stageIdToStageMetrics without calling trimExecutionsIfNecessary. The hash map may grow to enormous size.
Strictly speaking, it is not a memory leak, because finally trimExecutionsIfNecessary cleans the hash map. However, the driver program has high odds to crash with OutOfMemoryError (and it does).