Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
2.0.0, 2.0.1
-
None
-
None
Description
spark-streaming 2.0.1 and spark-streaming-kafka-0-10 version is 2.0.1. Same behavior with 2.0.0 though.
spark.streaming.kafka.consumer.poll.ms is set to 30000
spark.streaming.kafka.maxRatePerPartition is set to 100000
spark.streaming.backpressure.enabled is set to true
`batchDuration` of the streaming context is set to 1 second.
I consume a Kafka topic using KafkaUtils.createDirectStream().
My system can handle 100k records batches, but it'd take more than 1 seconds to process them all. I'd thus expect the backpressure to reduce the number of records that would be fetched in the next batch to keep the processing delay inferior to 1 second.
Only this does not happen and the rate of the backpressure stays the same: stuck in `100.0`, no matter how the other variables change (processing time, error, etc.).
Here's a log showing how all these variables change but the chosen rate stays the same: https://gist.github.com/Dinduks/d9fa67fc8a036d3cad8e859c508acdba (I would have attached a file but I don't see how).
Is this the expected behavior and I am missing something, or is this a bug?
I'll gladly help by providing more information or writing code if necessary.
Thank you.