Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-2913

STORM-2844 made autocommit and at-most-once storm-kafka-client spouts log warnings on every emit, because those modes don't commit the right metadata to Kafka

    XMLWordPrintableJSON

Details

    Description

      The mechanism added in https://issues.apache.org/jira/browse/STORM-2844 to allow us to check whether a committed offset was committed by the currently running topology requires that we commit some metadata along with the offset.

      We are using this metadata for two things: Only applying the FirstPollOffsetStrategy when the topology is deployed, rather than when the worker is restarted, and an (IMO fairly unimportant) runtime check that the spout offset tracking is not in a bad state.

      Autocommit spouts don't include this metadata, and we also don't include it when committing offsets in at-most-once mode. We can fix at-most-once by switching to committing a custom OffsetAndMetadata, rather than using the no-arg commitSync variant.

      I'm not sure what we should do to fix the autocommit case. There doesn't seem to be a way to include metadata in autocommits, so I don't think we can support this mechanism for autocommits.

      If we can't fix the autocommit case, I see two options for fixing this:

      • Make doSeek have the old behavior for autocommits only (i.e. apply the FirstPollOffsetStrategy on every worker restart), and keep the new behavior for at-least-once/at-most-once. I think this behavior could be a little confusing.
      • Revert doSeek to the old behavior in all cases, and throw out the runtime check that uses the metadata. This also isn't a great option, because the new seek behavior is more useful than restarting on every worker reboot.

      What do you think hmclouro? I'm leaning toward the first option.

      Attachments

        Issue Links

          Activity

            People

              srdo Stig Rohde Døssing
              srdo Stig Rohde Døssing
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5.5h
                  5.5h