Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2896

Support automatic migration of offsets from storm-kafka to storm-kafka-client KafkaSpout

    XMLWordPrintableJSON

Details

    Description

      I think we can ease migration for people looking to move from storm-kafka to storm-kafka-client. We should be able to support migrating offsets from the old spout by setting some extra configuration in KafkaSpoutConfig, and by adding a new FirstPollOffsetStrategy (e.g. something like FirstPollOffsetStrategy.UNCOMMITTED_MIGRATE_FROM_STORM_KAFKA).

      The old spout stores offsets in Storm's Zookeeper at one of two paths. The storm-kafka SpoutConfig has two parameters we'll also need, namely zkRoot and id. In addition we need to know if the storm-kafka subscription was a wildcard subscription or not.

      The zookeeper path for commit info is

      zkRoot + "/" + id + "/" + topicName + "partition_" + partition
      

      if the subscription was a wildcard. Otherwise it is

      zkRoot + "/" + id + "/" + "partition_" + partition
      

      We can get topicName and partition numbers from the KafkaConsumer.assignment. When we run initialize, we should be able to read the old offset structure from Zookeeper when the strategy is UNCOMMITTED_MIGRATE_FROM_STORM_KAFKA, and seek the consumer to those offsets. We can just crash if the offsets are not present.

      I'd be okay with this feature not being permanent, but I think this feature would make it a lot easier for people to move off the old spout.

      Attachments

        Activity

          People

            srdo Stig Rohde Døssing
            srdo Stig Rohde Døssing
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 10m
                2h 10m