Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.0
-
None
Description
We have an Spark Streaming application that reads from Kinesis and writes to Redshift.
Configuration:
Number of receivers = 5
Batch interval = 10 mins
spark.streaming.receiver.maxRate = 2000 (records per second)
According to this config, the max records that can be read in a single batch can be calculated using below formula:
Max records per batch = batch_interval * 60 (convert mins to seconds) * 5 (number of receivers) * 2000 (max records per second per receiver) 10 * 60 * 5 * 2000 = 6,000,000
But the actual number of records is more that the max number.
Batch I - 6,005,886 records
Batch II - 6,001,623 records
Batch III - 6,010,148 records
Please note that receivers are not even reading at the max rate, the records read per receiver are near 1900 per second.