Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2287

Indexer-elastic plugin should use Elasticsearch BulkProcessor and BackoffPolicy

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.12
    • Fix Version/s: 1.13
    • Component/s: indexer, plugin
    • Labels:
      None

      Description

      Elasticsearch's API (since at least v2.0) includes the BulkProcessor, which automatically handles flushing bulk requests given a max doc count and/or max bulk size. It also now (I believe since 2.2.0) offers a BackoffPolicy option, allowing the BulkProcessor/Client to retry bulk requests when the Elasticsearch cluster is saturated. Using the BulkProcessor was originally suggested here.

      Refactoring the indexer-elastic plugin to use the BulkProcessor will greatly simplify the existing plugin at the cost of slightly less debug logging. Additionally, it will allow the plugin to handle cluster saturation gracefully (rather than raising a RuntimeException and killing the reduce task), by using a configurable "exponential back-off policy".

      https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.3/java-docs-bulk-processor.html

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lewismc Lewis John McGibbney
                Reporter:
                naegelejd Joseph Naegele
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: