Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10357

Handle accidental deletion of repartition-topics as exceptional failure

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Repartition topics are both written by Stream's producer and read by Stream's consumer, so when they are accidentally deleted both clients may be notified. But in practice the consumer would react to it much quicker than producer since the latter has a delivery timeout expiration period (see https://issues.apache.org/jira/browse/KAFKA-10356). When consumer reacts to it, it will re-join the group since metadata changed and during the triggered rebalance it would auto-recreate the topic silently and continue, causing data lost silently.

      One idea, is to only create all repartition topics once in the first rebalance and not auto-create them any more in future rebalances, instead it would be treated similar as INCOMPLETE_SOURCE_TOPIC_METADATA error code (https://issues.apache.org/jira/browse/KAFKA-10355).

      The challenge part would be, how to determine if it is the first-ever rebalance, and there are several wild ideas I'd like to throw out here:

      1) change the thread state transition diagram so that STARTING state would not transit to PARTITION_REVOKED but only to PARTITION_ASSIGNED, then in the assign function we can check if the state is still in CREATED and not RUNNING.

      2) augment the subscriptionInfo to encode whether or not this is the first time ever rebalance.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cadonna Bruno Cadonna
            guozhang Guozhang Wang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment