Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-16495

Improve default flush strategy for Elasticsearch sink to make it work out-of-box

    XMLWordPrintableJSON

Details

    Description

      Currently, Elasticsearch sink provides 3 flush options:

      'connector.bulk-flush.max-actions' = '42'
      'connector.bulk-flush.max-size' = '42 mb'
      'connector.bulk-flush.interval' = '60000'
      

      All of them are optional and have no default value in Flink side [1]. But flush actions and flush size have a default value 1000 and 5mb in Elasticsearch client [2]. This results in some surprising behavior that no results are outputed by default, see user report [3]. Because it has to wait for 1000 records however there is no so many records in the testing.

      This will also be a potential "problem" in production. Because if it's a low throughout job, soem data may take a very long time to be visible in the elasticsearch.

      So we propose to have a default flush '1s' interval and '1000' rows and '2mb' size for ES sink flush. This only applies to new ES sink options:

      'sink.bulk-flush.max-actions' = '1000'
      'sink.bulk-flush.max-size' = '2mb'
      'sink.bulk-flush.interval' = '1s'
      

      [1]: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356
      [2]: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html
      [3]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html

      Attachments

        Issue Links

          Activity

            People

              jark Jark Wu
              jark Jark Wu
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: