Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
2.4.0
-
None
-
None
-
Spark 2.4 standalone client mode
Description
We are experiencing an issue where the Kafka consumer cache seems to overflow constantly upon starting the application. This issue appeared after upgrading to Spark 2.4.
We would get constant warnings like this:
18/12/18 07:03:29 WARN KafkaDataConsumer: KafkaConsumer cache hitting max capacity of 180, removing consumer for CacheKey(spark-kafka-source-6f66e0d2-beaf-4ff2-ade8-8996611de6ae--1081651087-executor,kafka-topic-76) 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max capacity of 180, removing consumer for CacheKey(spark-kafka-source-6f66e0d2-beaf-4ff2-ade8-8996611de6ae--1081651087-executor,kafka-topic-30) 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max capacity of 180, removing consumer for CacheKey(spark-kafka-source-f41d1f9e-1700-4994-9d26-2b9c0ee57881--215746753-executor,kafka-topic-57) 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max capacity of 180, removing consumer for CacheKey(spark-kafka-source-f41d1f9e-1700-4994-9d26-2b9c0ee57881--215746753-executor,kafka-topic-43)
This application is running 4 different Spark Structured Streaming queries against the same Kafka topic that has 90 partitions. We used to run it with just the default settings so it defaulted to cache size 64 on Spark 2.3 but now we tried to put it to 180 or 360. With 360 we will have a lot less noise about the overflow but resource need will increase substantially.