Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6634

Delay initiating the txn on producers until initializeTopology with EOS turned on

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0.3, 1.0.2, 1.1.0
    • Component/s: streams
    • Labels:
      None

      Description

      In Streams EOS implementation, the created producers for tasks will initiate a txn immediately after being created in the constructor of `StreamTask`. However, the task may not process any data and hence producer may not send any records for that started txn for a long time because of the restoration process. And with default txn.session.timeout valued at 60 seconds, it means that if the restoration takes more than that amount of time, upon starting the producer will immediately get the error that its producer epoch is already old.

      To fix this, we should consider instantiating the txn only after the restoration phase is done. Although this may have a caveat that if the producer is already fenced, it will not be notified until then, in initializeTopology. But I think this should not be a correctness issue since during the restoration process we do not make any changes to the processing state.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                guozhang Guozhang Wang
                Reporter:
                guozhang Guozhang Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: