Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9017

We see timeout in kafka in production cluster

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 2.1.0
    • Fix Version/s: None
    • Component/s: KafkaConnect
    • Labels:
      None
    • Environment:
      Production

      Description

      We see timeout in kafka in production cluster and Kafka is running on DC/OS(MESOS)

      and below are the errors 

      Exception 1: This from application logs

      2019-10-07 10:01:59 Error: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for ie-lrx-audit-evt-3: 30030 ms has passed since batch creation plus linger time

      Exception 2:This from application logs
       

      {"eventTime":"2019-10-07 08:20:43.265", "logType":"ERROR", "stackMessage" : "java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for ie-lrx-audit-evt-3: 30028 ms has passed since batch creation plus linger time", "stackTrace" :  *+Exception (from log) We see this logs on broker logs+* [2019-10-10 06:32:10,844] INFO [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Error sending fetch request (sessionId=919177392, epoch=INITIAL) to node 2: java.io.IOException: Connection to 2 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandler)[2019-10-10 06:32:10,844] INFO [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Error sending fetch request (sessionId=919177392, epoch=INITIAL) to node 2: java.io.IOException: Connection to 2 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandler)[2019-10-10 06:32:10,849] WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=4, maxWait=500, minBytes=1, maxBytes=10485760, fetchData=\{ie-lrx-rxer-audit-evt-0=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), mft-hdfs-landing-evt-1=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), dca-audit-evt-2=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), it-sou-audit-evt-7=(offset=94819, logStartOffset=94819, maxBytes=1048576, currentLeaderEpoch=Optional[100]), intg-ie-lrx-rxer-audit-evt-2=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[78]), prod-pipelines-errors-evt-0=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[117]), __consumer_offsets-36=(offset=3, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), panel-data-change-evt-4=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), gdcp-notification-evt-2=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), data-transfer-change-evt-0=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), __consumer_offsets-11=(offset=15, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), dca-heartbeat-evt-2=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[105]), ukwhs-error-topic-1=(offset=8, logStartOffset=8, maxBytes=1048576, currentLeaderEpoch=Optional[105]), intg-ie-lrx-audit-evt-4=(offset=21, logStartOffset=21, maxBytes=1048576, currentLeaderEpoch=Optional[74]), __consumer_offsets-16=(offset=11329814, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), __consumer_offsets-31=(offset=3472033, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[107]), ukpai-hdfs-evt-1=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[107]), mft-pflow-evt-1=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), ukwhs-hdfs-landing-evt-01-2=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[105]), it-sou-audit-evt-2=(offset=490084, logStartOffset=490084, maxBytes=1048576, currentLeaderEpoch=Optional[105]), ie-lrx-pat-audit-evt-4=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104])}

      , isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=919177392, epoch=INITIAL)) (kafka.server.ReplicaFetcherThread)java.io.IOException: Connection to 2 was disconnected before the response was read at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97) at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97) at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:241) at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130) at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)

        Attachments

        1. stdout (12)
          236 kB
          Suhas
        2. stderr (7)
          47 kB
          Suhas

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              suhas_dcp Suhas
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: