Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2296

Kafka spout - no duplicates on topic leader changes

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.2
    • Fix Version/s: 2.0.0, 1.1.0, 1.0.4
    • Component/s: storm-kafka
    • Labels:
      None

      Description

      Current behavior of Kafka spout emits duplicate tuples whenever Kafka topic leader's change.
      In case of exception caused by leader changes, PartitionManagers are simply recreated losing the state about which tuples were already emitted and new PartitionManager re-emits them again.

      This is fine as at-least-once is fulfilled, but still it would be better to not emit duplicate data if possible.
      Moreover this could be easily avoided by moving the state related to emitted tuples from old PartitionManager to new one.

      Pull requests implementing this:
      1.0.x-branch - https://github.com/apache/storm/pull/1873
      1.x-branch - https://github.com/apache/storm/pull/1888

      Pull request for related bugfix: https://github.com/apache/storm/pull/1940

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ernisv Ernestas Vaiciukevičius
                Reporter:
                ernisv Ernestas Vaiciukevičius
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h