Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18620

Spark Streaming + Kinesis : Receiver MaxRate is violated

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.2
    • 2.2.0
    • DStreams

    Description

      I am calling spark-submit passing maxRate, I have a single kinesis receiver, and batches of 1s

      spark-submit --conf spark.streaming.receiver.maxRate=10 ....

      however a single batch can greatly exceed the stablished maxRate. i.e: Im getting 300 records.

      it looks like Kinesis is completely ignoring the spark.streaming.receiver.maxRate configuration.

      If you look inside KinesisReceiver.onStart, you see:

      val kinesisClientLibConfiguration =
      new KinesisClientLibConfiguration(checkpointAppName, streamName, awsCredProvider, workerId)
      .withKinesisEndpoint(endpointUrl)
      .withInitialPositionInStream(initialPositionInStream)
      .withTaskBackoffTimeMillis(500)
      .withRegionName(regionName)

      This constructor ends up calling another constructor which has a lot of default values for the configuration. One of those values is DEFAULT_MAX_RECORDS which is constantly set to 10,000 records.

      Attachments

        1. Apply_limit in_spark_with_my_patch.png
          102 kB
          Takeshi Yamamuro
        2. Apply_limit in_vanilla_spark.png
          107 kB
          Takeshi Yamamuro
        3. Apply_no_limit.png
          117 kB
          Takeshi Yamamuro

        Issue Links

          Activity

            People

              maropu Takeshi Yamamuro
              dav009 david przybill
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: