Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6846

Controller can spend long time in shutting down RequestSendThread when processing BrokerChange event

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • controller
    • None

    Description

      Controller can spend a long time (more than 60s) in processing BrokerChange event when there are dead brokers. For example, we saw entries like these in controller log:

       

      2018/04/28 18:13:50.021 [KafkaController] [Controller 7586]: Newly added brokers: , deleted brokers: 5222, bounced Brokers: , all live brokers: 3238,3322,5134,5177,5213,5214,5217,5218,5219,5220,5221,5319,5652,5949,7569,7574,7577,7581,7586,7589,7594,7595,7601,7609,14838,14840,14848,14855,14882,14886,14889,14901,16033
      2018/04/28 18:13:50.021 [RequestSendThread] [Controller-7586-to-broker-5222-send-thread]: Shutting down
      .
      .
      .
      2018/04/28 18:14:49.196 [RequestSendThread] [Controller-7586-to-broker-5222-send-thread]: Shutdown completed
      2018/04/28 18:14:49.196 [RequestSendThread] [Controller-7586-to-broker-5222-send-thread]: Stopped
      2018/04/28 18:14:49.200 [KafkaController] [Controller 7586]: Broker failure callback for 5222

       

      It indicates that the time difference between RequestSendThread shutdown is initiated (18:13:50) and shutdown completes (18:14:49) is 59s.

      The root cause is that RequestSendThread will call NetworkClient.pool() in a while loop in NetworkClientsUtils.awaitReady() and NetworkClientsUtils.sendAndReceive() without checking the interrupt flag. This causes the interrupt triggered by controller thread only breaks poll() for once and then the RequestSendThread will be blocked in the next poll() until it receives the disconnected message or timeout, before it can actually finish the shutdown. During this time period, controller event thread is blocked to wait for the shutdownComplete latch, which is bad because we only have single controller event thread.

      This issue can be resolved by making the thread throw InterruptedException right after each poll call in awaitReady() and sendAndReceive() if it sees the interrupt flag has been set. I will create a PR for that.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hzxa21 Zhanxiang (Patrick) Huang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: