Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9285

Implement failed message topic to account for processing lag during failure



    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • consumer


      Presently, in current Kafka failure schematics, when a consumer crashes, the user is typically responsible for both detecting as well as restarting the failed consumer. Therefore, during this period of time, when the consumer is dead, it would result in a period of inactivity where no records are consumed, hence lag results. Previously, there has been attempts to resolve this problem: when failure is detected by broker, a substitute consumer will be started (the so-called [Rebalance Consumer|https://cwiki.apache.org/confluence/display/KAFKA/KIP-333%3A+Add+faster+mode+of+rebalancing]) which will continue processing records in Kafka's stead. 

      However, this has complications, as records will only be stored locally, and in case of this consumer failing as well, that data will be lost. Instead, we need to consider how we can still process these records and at the same time effectively persist them. It is here that I propose the concept of a failed message topic. At a high level, it works like this. When we find that a consumer has failed, messages which was originally meant to be sent to that consumer would be redirected to this failed messaged topic. The user can choose to assign consumers to this topic, which would consume messages (that would've originally been processed by the failed consumers) from it.

      Naturally, records from different topics can not go into the same failed message topic, since we cannot tell which records belong to which consumer.





            Yohan123 Richard Yu
            Yohan123 Richard Yu
            0 Vote for this issue
            1 Start watching this issue