The mechanism added in https://issues.apache.org/jira/browse/STORM-2844 to allow us to check whether a committed offset was committed by the currently running topology requires that we commit some metadata along with the offset.
We are using this metadata for two things: Only applying the FirstPollOffsetStrategy when the topology is deployed, rather than when the worker is restarted, and an (IMO fairly unimportant) runtime check that the spout offset tracking is not in a bad state.
Autocommit spouts don't include this metadata, and we also don't include it when committing offsets in at-most-once mode. We can fix at-most-once by switching to committing a custom OffsetAndMetadata, rather than using the no-arg commitSync variant.
I'm not sure what we should do to fix the autocommit case. There doesn't seem to be a way to include metadata in autocommits, so I don't think we can support this mechanism for autocommits.
If we can't fix the autocommit case, I see two options for fixing this:
- Make doSeek have the old behavior for autocommits only (i.e. apply the FirstPollOffsetStrategy on every worker restart), and keep the new behavior for at-least-once/at-most-once. I think this behavior could be a little confusing.
- Revert doSeek to the old behavior in all cases, and throw out the runtime check that uses the metadata. This also isn't a great option, because the new seek behavior is more useful than restarting on every worker reboot.
What do you think Hugo Da Cruz Louro? I'm leaning toward the first option.