Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-14890

Kafka initiates shutdown due to connectivity problem with Zookeeper and FatalExitError from ChangeNotificationProcessorThread

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.3.2
    • None
    • core
    • None

    Description

      Hello,

      We have faced several times the deadlock in Kafka, the similar issue is - https://issues.apache.org/jira/browse/KAFKA-13544

      The question - is it expected behavior that Kafka decided to shut down due to connectivity problems with Zookeeper? Seems like it is related to the inability to read data from /feature Zk node and the ZooKeeperClientExpiredException thrown from ZooKeeperClient class. This exception is thrown and it is caught only in catch block of doWork() method in ChangeNotificationProcessorThread, and it leads to FatalExitError.

      This problem with shutdown is reproduced in the new versions of Kafka (which already have fix regarding deadlock from 13544). 
      It is hard to write a synthetic test to reproduce problem, but it can be reproduced locally via debug mode with the following steps:
      1) Start Zookeeper and start Kafka in debug mode.
      2) Emulate connectivity problem between Kafka and Zookeeper, for example connection can be closed via Netcrusher library.
      3) Put a breakpoint in updateLatestOrThrow() method in FeatureCacheUpdater class, before zkClient.getDataAndVersion(featureZkNodePath) line execution.
      4) Restore connection between Kafka and Zookeeper after session expiration. Kafka execution should be stopped on the breakpoint.
      5) Resume execution until Kafka starts to execute line zooKeeperClient.handleRequests(remainingRequests) in retryRequestsUntilConnected method in KafkaZkClient class.
      6) Again emulate connectivity problem between Kafka and Zookeeper and wait until session will be expired.
      7) Restore connection between Kafka and Zookeeper.
      8) Kafka begins shutdown process, due to:
      ERROR [feature-zk-node-event-process-thread]: Failed to process feature ZK node change event. The broker will eventually exit. (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)

      The following problems on the real environment can be caused by some network problems and periodic disconnection and connection to the Zookeeper in a short time period. 

      I started mail thread in https://lists.apache.org/thread/gbk4scwd8g7mg2tfsokzj5tjgrjrb9dw regarding this problem, but have no answers.

      For me it seems like defect, because Kafka initiates shutdown after restoring connection between Kafka and Zookeeper, and should be fixed. 

      Thank you.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              denis_razuvaev Denis Razuvaev
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: