[FLUME-2222] Duplicate entries in Elasticsearch when using Flume elasticsearch-sink - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 1.4.0
Fix Version/s: None
Component/s: Sinks+Sources
Labels:
- elasticsearch
- sink
Environment:

centos 6

Description

Hello,

I'm using flume elasticsearch-sink to transfer logs from ec2 instances to elasticsearch and I get duplicate entries for numerous documents.

I've noticed this issue when I was sending a specific number of log lines to elasticsearch using flume and then I was counting them using kibana to verify that all of them arrived. Most of the time, especially when multiple flume instances were used, I was getting duplicate entries. e.g. instead of receiving 10000 documents from an instance, I was receiving 10060.

Duplication level seems to be proportional to the number of instances sending log data simultaneously. e.g. with 3 flume instances I get 10060, with 50 flume instances I get 10300.

Is duplication something that I should expect when using flume elasticsearch-sink?
There is a doRollback() method that is called on transaction failure but I think that it updates only the local flume channel and not elasticsearch.

Any info/suggestions would be appreciated.

Regards,
Nick

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screen Shot 2013-10-29 at 12.36.01.png
29/Oct/13 14:31
86 kB
Nikolaos Tsipas

Activity

People

Assignee:: Ashish Paliwal

Reporter:: Nikolaos Tsipas

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Oct/13 16:53

Updated:: 08/Oct/14 20:41

Resolved:: 12/Mar/14 06:35