Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-13388

Kafka Producer nodes stuck in CHECKING_API_VERSIONS

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.0.1, 3.2.0, 3.1.1
    • core
    • None

    Description

      I have been seeing expired batch errors in my app.

      org.apache.kafka.common.errors.TimeoutException: Expiring 51 record(s) for xxx-17:120002 ms has passed since batch creation
      

       I would have assumed a request timout or connection timeout should have also been logged. I could not find any other associated errors. 

      I added some instrumenting to my app and have traced this down to broker connections hanging in CHECKING_API_VERSIONS state. It appears there is no effective timeout for Kafka Producer broker connections in CHECKING_API_VERSIONS state.

      In the code see the after the NetworkClient connects to a broker node it makes a request to check api versions, when it receives the response it marks the node as ready. I am seeing that sometimes a reply is not received for the check api versions request the connection just hangs in CHECKING_API_VERSIONS state until it is disposed I assume after the idle connection timeout.

      Update: not actually sure what causes the connection to get stuck in CHECKING_API_VERSIONS.

      I am guessing the connection setup timeout should be still in play for this, but it is not.
      There is a connectingNodes set that is consulted when checking timeouts and the node is removed
      when ClusterConnectionStates.checkingApiVersions(String id) is called to transition the node into CHECKING_API_VERSIONS

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dajac David Jacot
            dhofftgt David Hoffman
            Rajini Sivaram Rajini Sivaram
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment