[NUTCH-2287] Indexer-elastic plugin should use Elasticsearch BulkProcessor and BackoffPolicy - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.12
Fix Version/s: 1.13
Component/s: indexer, plugin
Labels:
None

Description

Elasticsearch's API (since at least v2.0) includes the BulkProcessor, which automatically handles flushing bulk requests given a max doc count and/or max bulk size. It also now (I believe since 2.2.0) offers a BackoffPolicy option, allowing the BulkProcessor/Client to retry bulk requests when the Elasticsearch cluster is saturated. Using the BulkProcessor was originally suggested here.

Refactoring the indexer-elastic plugin to use the BulkProcessor will greatly simplify the existing plugin at the cost of slightly less debug logging. Additionally, it will allow the plugin to handle cluster saturation gracefully (rather than raising a RuntimeException and killing the reduce task), by using a configurable "exponential back-off policy".

https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.3/java-docs-bulk-processor.html

Attachments

Issue Links

links to

GitHub Pull Request #131

Activity

People

Assignee:: Lewis John McGibbney

Reporter:: Joseph Naegele

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 24/Jun/16 21:15

Updated:: 13/Mar/24 14:51

Resolved:: 16/Jul/16 21:40