Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-440

UnknownTopicOrPartitionCode results in infinite loop in BrokerProxy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.8.0
    • 0.8.0
    • kafka
    • None

    Description

      We have seen several occasions where shifting partitions in a Kafka cluster results in some Samza containers getting stuck with:

      2014-10-22 15:10:48 BrokerProxy [INFO] Creating new SimpleConsumer for host eat1-app582.corp:10251 for system kafka
      2014-10-22 15:10:48 BrokerProxy [WARN] Got non-recoverable error codes during multifetch. Throwing an exception to trigger reconnect. Errors: Error([all-service-call-events,10],3,kafka.common.UnknownTopicOrPartitionException)
      2014-10-22 15:10:48 BrokerProxy [WARN] Restarting consumer due to kafka.common.UnknownTopicOrPartitionException. Turn on debugging to get a full stack trace.
      2014-10-22 15:10:58 BrokerProxy [INFO] Creating new SimpleConsumer for host eat1-app582.corp:10251 for system kafka
      2014-10-22 15:10:58 BrokerProxy [WARN] Got non-recoverable error codes during multifetch. Throwing an exception to trigger reconnect. Errors: Error([all-service-call-events,10],3,kafka.common.UnknownTopicOrPartitionException)
      2014-10-22 15:10:58 BrokerProxy [WARN] Restarting consumer due to kafka.common.UnknownTopicOrPartitionException. Turn on debugging to get a full stack trace.
      2014-10-22 15:11:08 BrokerProxy [INFO] Creating new SimpleConsumer for host eat1-app582.corp:10251 for system kafka
      

      The problem appears to be a misunderstanding in how Kafka works. If a partition is moved to another broker, and the BrokerProxy continues fetching on the old broker, it will throw an UnknownTopicOrPartitionException, and try and try and reconnect to the same broker. It will do this indefinitely. Instead, the BrokerProxy should abdicate the TopicAndPartition, and allow the new broker to pick it up.

      Attachments

        1. SAMZA-440-0.patch
          2 kB
          Chris Riccomini

        Activity

          People

            criccomini Chris Riccomini
            criccomini Chris Riccomini
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: