Checking the heap allocation with VirtualVM indicates that JMX Mbean Server memory usage grows linearly with time.
After a further investigation it seems that JMX Mbean Server is filled with thousands of instances of KafkaMbean objects with metrics for consumer-\d+ that goes into thousands (equal to the number of tasks created on the executor).
Running Kafka consumer with DEBUG logs on the executor shows that the executor adds thousands of metrics sensors and often does not remove them at all or only removes some.
I would expect KafkaMbeans to be cleaned once the task has been completed.
But it seems that they are not cleaned when spark produces the following message:
According to KafkaSourceRDD code consumer.release() is not called in this case eventually resulting in KafkaMetrics being retained in JMX Mbean Server for the corresponding task/consumer id.
Here is how I initialise structured streaming:
The Kafka partitions in question often have no data due to the sporadic nature of the producer.