Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12665

one of brokers which is also controller has too much CLOSE_WAITE

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.11.0.1
    • Fix Version/s: None
    • Component/s: clients, consumer, controller, core
    • Labels:
      None

      Description

      1. enviroment

      apache- 0.11.0.1

      5 nodes

      3 replicator

      mean message per sec : 4k

      Prometheus & jmxProt & grafana

      cosumer : spring boot& Doris routineLoad

      producer: spring boo& Log 

       

      1. encounter with

       we encounter with a broker (id : 4)which is also controller (epoch 90)having much CLOSE_WAITE  at a time 

      controller.log

       

      Controller 4 epoch 90 fails to send request (type: UpdateMetadataRequest ...
      java.io.IOException: Connection to 4 was disconnected before the response was read
      

       

      It will be retried many, many times, but the WARNING will not change

       

      At the same time

      another broker 6  fetching message from the broker 4 also encounter with the problem

      [2021-04-13 16:35:06,942] WARN [ReplicaFetcherThread-0-4]: Error in fetch to broker 4, request (type=FetchRequest, replicaId=6, maxWait=500, minBytes=1, maxBytes=10485760,
      java.io.IOException: Connection to 4 was disconnected before the response was read
      

       

      doris routineLoad(consume from kafka) time out

       

      2021-04-13 16:35:11,397 WARN (Routine load scheduler|42) [KafkaUtil.getAllKafkaPartitions():91] failed to get partitions. org.apache.doris.common.UserException: errCode = 2, detailMessage = failed to get kafka partition info: [failed to get partition meta: Local: Timed out]
      

       

       

      broker 4( controller 90) fs.file

      Most of the CLOSE_WAITE is generated by the consumer application

      At 16:49, the broker was restarted and returned to normal

       

       

        1. speculation*

      The TCP connection is closed passively, but the processing of the Controller Broker machine is not responding

      Are there any bugs in this version?

       

       

       

        Attachments

        1. image-2021-04-14-10-32-54-140.png
          435 kB
          GeoffreyStark
        2. image-2021-04-14-10-39-02-996.png
          330 kB
          GeoffreyStark
        3. image-2021-04-14-11-26-03-346.png
          229 kB
          GeoffreyStark

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gaofeng6 GeoffreyStark
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: