Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-607

BrokerProxy gets stuck on down brokers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.8.0
    • None
    • None
    • None

    Description

      I took a broker offline for a few hours today and found that a Samza job was stuck trying to read from it while it was down, instead of switching to another broker in the ISR (this was a replicated topic with some partitions under-replicated, but all partitions available). During this time the BrokerProxy thread was in a retry loop logging a lot of ClosedChannelExceptions.

      The broker had done a clean shutdown, but I think what happened is that the BrokerProxy just hadn't made any calls between when that broker stopped being leader for its partitions and when that broker went offline. So, it never got a NotLeaderForPartitionException and never abdicated.

      Would it make sense for the BrokerProxy to abdicate all of its topic-partitions after getting too many network errors, and possibly shut itself down if it becomes empty? I think it'd be good to support brokers going offline temporarily or even permanently.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gian Gian Merlino
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: