Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2296

Kafka spout - no duplicates on topic leader changes

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.2
    • 2.0.0, 1.1.0, 1.0.4
    • storm-kafka
    • None

    Description

      Current behavior of Kafka spout emits duplicate tuples whenever Kafka topic leader's change.
      In case of exception caused by leader changes, PartitionManagers are simply recreated losing the state about which tuples were already emitted and new PartitionManager re-emits them again.

      This is fine as at-least-once is fulfilled, but still it would be better to not emit duplicate data if possible.
      Moreover this could be easily avoided by moving the state related to emitted tuples from old PartitionManager to new one.

      Pull requests implementing this:
      1.0.x-branch - https://github.com/apache/storm/pull/1873
      1.x-branch - https://github.com/apache/storm/pull/1888

      Pull request for related bugfix: https://github.com/apache/storm/pull/1940

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ernisv Ernestas Vaiciukevičius
            ernisv Ernestas Vaiciukevičius
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h
                3h

                Slack

                  Issue deployment