Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2287

Indexer-elastic plugin should use Elasticsearch BulkProcessor and BackoffPolicy

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.12
    • 1.13
    • indexer, plugin
    • None

    Description

      Elasticsearch's API (since at least v2.0) includes the BulkProcessor, which automatically handles flushing bulk requests given a max doc count and/or max bulk size. It also now (I believe since 2.2.0) offers a BackoffPolicy option, allowing the BulkProcessor/Client to retry bulk requests when the Elasticsearch cluster is saturated. Using the BulkProcessor was originally suggested here.

      Refactoring the indexer-elastic plugin to use the BulkProcessor will greatly simplify the existing plugin at the cost of slightly less debug logging. Additionally, it will allow the plugin to handle cluster saturation gracefully (rather than raising a RuntimeException and killing the reduce task), by using a configurable "exponential back-off policy".

      https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.3/java-docs-bulk-processor.html

      Attachments

        Issue Links

          Activity

            People

              lewismc Lewis John McGibbney
              naegelejd Joseph Naegele
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: