Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9270

KafkaStream crash on offset commit failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.0.1
    • None
    • streams
    • None

    Description

      On our Production server we intermittently observe Kafka Streams get crashed with TimeoutException while committing offset. The only workaround seems to be restarting the application which is not a desirable solution for a production environment.

       

      While have already implemented ProductionExceptionHandler which does not seems to address this.

       

      Please provide a fix for this or a viable workaround.

       

      Application side logs:

      2019-11-17 08:28:48.055 +0000 [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] - org.apache.kafka.streams.processor.internals.AssignedStreamsTasks [org.apache.kafka.streams.processor.internals.AssignedTasks:applyToRunningTasks:373] - stream-thread [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] Failed to commit stream task 0_1 due to the following error:
      org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {AggregateJob-1=OffsetAndMetadata{offset=176729402, metadata=''}}

       

      2019-11-17 08:29:00.891 +0000 [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -    [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-12019-11-17 08:29:00.891 +0000 [AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1] [ERROR] -    [:lambda$init$2:130] - Stream crashed!!! StreamsThread threadId: AggregateJob-614fe688-c9a4-4dad-a881-71488030918b-StreamThread-1TaskManager MetadataState: GlobalMetadata: [] GlobalStores: [] My HostInfo: HostInfo{host='unknown', port=-1} Cluster(id = null, nodes = [], partitions = [], controller = null) Active tasks: Running: Suspended: Restoring: New: Standby tasks: Running: Suspended: Restoring: New:
      org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {AggregateJob-0=OffsetAndMetadata{offset=189808059, metadata=''}}

       

      Kafka broker logs:

      [2019-11-17 13:53:22,774] WARN Client session timed out, have not heard from server in 6669ms for sessionid 0x10068e4a2944c2f (org.apache.zookeeper.ClientCnxn)
      [2019-11-17 13:53:22,809] INFO Client session timed out, have not heard from server in 6669ms for sessionid 0x10068e4a2944c2f, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)

       

      Regards,

      Rohan

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rohan26may Rohan Kulkarni
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: