Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2296

Kafka spout - no duplicates on topic leader changes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.2
    • 2.0.0, 1.1.0, 1.0.4
    • storm-kafka
    • None

    Description

      Current behavior of Kafka spout emits duplicate tuples whenever Kafka topic leader's change.
      In case of exception caused by leader changes, PartitionManagers are simply recreated losing the state about which tuples were already emitted and new PartitionManager re-emits them again.

      This is fine as at-least-once is fulfilled, but still it would be better to not emit duplicate data if possible.
      Moreover this could be easily avoided by moving the state related to emitted tuples from old PartitionManager to new one.

      Pull requests implementing this:
      1.0.x-branch - https://github.com/apache/storm/pull/1873
      1.x-branch - https://github.com/apache/storm/pull/1888

      Pull request for related bugfix: https://github.com/apache/storm/pull/1940

      Attachments

        Issue Links

          Activity

            People

              ernisv Ernestas Vaiciukevičius
              ernisv Ernestas Vaiciukevičius
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h