Flume
  1. Flume
  2. FLUME-2222

Duplicate entries in Elasticsearch when using Flume elasticsearch-sink

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: v1.4.0
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Environment:

      centos 6

      Description

      Hello,

      I'm using flume elasticsearch-sink to transfer logs from ec2 instances to elasticsearch and I get duplicate entries for numerous documents.

      I've noticed this issue when I was sending a specific number of log lines to elasticsearch using flume and then I was counting them using kibana to verify that all of them arrived. Most of the time, especially when multiple flume instances were used, I was getting duplicate entries. e.g. instead of receiving 10000 documents from an instance, I was receiving 10060.

      Duplication level seems to be proportional to the number of instances sending log data simultaneously. e.g. with 3 flume instances I get 10060, with 50 flume instances I get 10300.

      Is duplication something that I should expect when using flume elasticsearch-sink?
      There is a doRollback() method that is called on transaction failure but I think that it updates only the local flume channel and not elasticsearch.

      Any info/suggestions would be appreciated.

      Regards,
      Nick

        Activity

          People

          • Assignee:
            Ashish Paliwal
            Reporter:
            Nikolaos Tsipas
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development