Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-7508

switch FlinkKinesisProducer to use KPL's ThreadingMode to ThreadedPool mode rather than Per_Request mode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.3.0
    • 1.4.0
    • Connectors / Kinesis
    • None
    • Important

    Description

      KinesisProducerLibrary (KPL) 0.10.x had been using a One-New-Thread-Per-Request model for all requests sent to AWS Kinesis, which is very expensive.

      0.12.4 introduced a new ThreadingMode - Pooled, which will use a thread pool. This hugely improves KPL's performance and reduces consumed resources. By default, KPL still uses per-request mode. We should explicitly switch FlinkKinesisProducer's KPL threading mode to 'Pooled'.

      This work depends on FLINK-7366 and FLINK-7508

      Benchmarking I did:

      • Environment: Running a Flink hourly-sliding windowing job on 18-node EMR cluster with R4.2xlarge instances. Each hourly-sliding window in the Flink job generates about 21million UserRecords, which means that we generated a test load of 21million UserRecords at the first minute of each hour.
      • Criteria: Test KPL throughput per minute. Since the default RecordTTL for KPL is 30 sec, we can be sure that either all UserRecords are sent by KPL within a minute, or we will see UserRecord expiration errors.
      • One-New-Thread-Per-Request model: max throughput is about 2million UserRecords per min; it doesn't go beyond that because CPU utilization goes to 100%, everything stopped working and that Flink job crashed.
      • Thread-Pool model with pool size of 10: it sends out 21million UserRecords within 30 sec without any UserRecord expiration errors. The average peak CPU utilization is about 20% - 30%. So 21million UserRecords/min is not the max throughput of thread-pool model. We didn't go any further because 1) this throughput is already a couple times more than what we really need, and 2) we don't have a quick way of increasing the test load

      Thus, I propose switching FlinkKinesisProducer to Thread-Pool mode. tzulitai What do you think

      Attachments

        Issue Links

          Activity

            People

              phoenixjiangnan Bowen Li
              phoenixjiangnan Bowen Li
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: