Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9017

We see timeout in kafka in production cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.1.0
    • None
    • core
    • None
    • Production

    Description

      We see timeout in kafka in production cluster and Kafka is running on DC/OS(MESOS)

      and below are the errors 

      Exception 1: This from application logs

      2019-10-07 10:01:59 Error: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for ie-lrx-audit-evt-3: 30030 ms has passed since batch creation plus linger time

      Exception 2:This from application logs
       

      {"eventTime":"2019-10-07 08:20:43.265", "logType":"ERROR", "stackMessage" : "java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for ie-lrx-audit-evt-3: 30028 ms has passed since batch creation plus linger time", "stackTrace" :  *+Exception (from log) We see this logs on broker logs+* [2019-10-10 06:32:10,844] INFO [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Error sending fetch request (sessionId=919177392, epoch=INITIAL) to node 2: java.io.IOException: Connection to 2 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandler)[2019-10-10 06:32:10,844] INFO [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Error sending fetch request (sessionId=919177392, epoch=INITIAL) to node 2: java.io.IOException: Connection to 2 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandler)[2019-10-10 06:32:10,849] WARN [ReplicaFetcher replicaId=4, leaderId=2, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=4, maxWait=500, minBytes=1, maxBytes=10485760, fetchData=\{ie-lrx-rxer-audit-evt-0=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), mft-hdfs-landing-evt-1=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), dca-audit-evt-2=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), it-sou-audit-evt-7=(offset=94819, logStartOffset=94819, maxBytes=1048576, currentLeaderEpoch=Optional[100]), intg-ie-lrx-rxer-audit-evt-2=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[78]), prod-pipelines-errors-evt-0=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[117]), __consumer_offsets-36=(offset=3, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), panel-data-change-evt-4=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), gdcp-notification-evt-2=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), data-transfer-change-evt-0=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), __consumer_offsets-11=(offset=15, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), dca-heartbeat-evt-2=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[105]), ukwhs-error-topic-1=(offset=8, logStartOffset=8, maxBytes=1048576, currentLeaderEpoch=Optional[105]), intg-ie-lrx-audit-evt-4=(offset=21, logStartOffset=21, maxBytes=1048576, currentLeaderEpoch=Optional[74]), __consumer_offsets-16=(offset=11329814, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104]), __consumer_offsets-31=(offset=3472033, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[107]), ukpai-hdfs-evt-1=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[107]), mft-pflow-evt-1=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[108]), ukwhs-hdfs-landing-evt-01-2=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[105]), it-sou-audit-evt-2=(offset=490084, logStartOffset=490084, maxBytes=1048576, currentLeaderEpoch=Optional[105]), ie-lrx-pat-audit-evt-4=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[104])}

      , isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=919177392, epoch=INITIAL)) (kafka.server.ReplicaFetcherThread)java.io.IOException: Connection to 2 was disconnected before the response was read at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97) at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97) at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:241) at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130) at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)

      Attachments

        1. stderr (7)
          47 kB
          Suhas
        2. stdout (12)
          236 kB
          Suhas

        Activity

          People

            Unassigned Unassigned
            suhas_dcp Suhas
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: