Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2896

Support automatic migration of offsets from storm-kafka to storm-kafka-client KafkaSpout

    Details

      Description

      I think we can ease migration for people looking to move from storm-kafka to storm-kafka-client. We should be able to support migrating offsets from the old spout by setting some extra configuration in KafkaSpoutConfig, and by adding a new FirstPollOffsetStrategy (e.g. something like FirstPollOffsetStrategy.UNCOMMITTED_MIGRATE_FROM_STORM_KAFKA).

      The old spout stores offsets in Storm's Zookeeper at one of two paths. The storm-kafka SpoutConfig has two parameters we'll also need, namely zkRoot and id. In addition we need to know if the storm-kafka subscription was a wildcard subscription or not.

      The zookeeper path for commit info is

      zkRoot + "/" + id + "/" + topicName + "partition_" + partition
      

      if the subscription was a wildcard. Otherwise it is

      zkRoot + "/" + id + "/" + "partition_" + partition
      

      We can get topicName and partition numbers from the KafkaConsumer.assignment. When we run initialize, we should be able to read the old offset structure from Zookeeper when the strategy is UNCOMMITTED_MIGRATE_FROM_STORM_KAFKA, and seek the consumer to those offsets. We can just crash if the offsets are not present.

      I'd be okay with this feature not being permanent, but I think this feature would make it a lot easier for people to move off the old spout.

        Attachments

          Activity

            People

            • Assignee:
              Srdo Stig Rohde Døssing
              Reporter:
              Srdo Stig Rohde Døssing
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 10m
                2h 10m