Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18620

Spark Streaming + Kinesis : Receiver MaxRate is violated

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.0.2
    • Fix Version/s: 2.2.0
    • Component/s: DStreams
    • Labels:

      Description

      I am calling spark-submit passing maxRate, I have a single kinesis receiver, and batches of 1s

      spark-submit --conf spark.streaming.receiver.maxRate=10 ....

      however a single batch can greatly exceed the stablished maxRate. i.e: Im getting 300 records.

      it looks like Kinesis is completely ignoring the spark.streaming.receiver.maxRate configuration.

      If you look inside KinesisReceiver.onStart, you see:

      val kinesisClientLibConfiguration =
      new KinesisClientLibConfiguration(checkpointAppName, streamName, awsCredProvider, workerId)
      .withKinesisEndpoint(endpointUrl)
      .withInitialPositionInStream(initialPositionInStream)
      .withTaskBackoffTimeMillis(500)
      .withRegionName(regionName)

      This constructor ends up calling another constructor which has a lot of default values for the configuration. One of those values is DEFAULT_MAX_RECORDS which is constantly set to 10,000 records.

        Attachments

        1. Apply_no_limit.png
          117 kB
          Takeshi Yamamuro
        2. Apply_limit in_vanilla_spark.png
          107 kB
          Takeshi Yamamuro
        3. Apply_limit in_spark_with_my_patch.png
          102 kB
          Takeshi Yamamuro

          Issue Links

            Activity

              People

              • Assignee:
                maropu Takeshi Yamamuro
                Reporter:
                dav009 david przybill
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: