Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26086

Spark streaming max records per batch interval

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • DStreams

    Description

      We have an Spark Streaming application that reads from Kinesis and writes to Redshift.

      Configuration:

      Number of receivers = 5

      Batch interval = 10 mins

      spark.streaming.receiver.maxRate = 2000 (records per second)

      According to this config, the max records that can be read in a single batch can be calculated using below formula:

       

      Max records per batch = batch_interval * 60 (convert mins to seconds) * 5 (number of receivers) * 2000 (max records per second per receiver)
      10 * 60 * 5 * 2000 = 6,000,000
      

       

      But the actual number of records is more that the max number.

      Batch I - 6,005,886 records

      Batch II - 6,001,623 records

      Batch III - 6,010,148 records

      Please note that receivers are not even reading at the max rate, the records read per receiver are near 1900 per second.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Vijayant vijayant soni
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: