Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5156 Options for handling exceptions in streams
  3. KAFKA-5397

streams are not recovering from LockException during rebalancing

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.2.1, 0.11.0.0
    • Fix Version/s: 1.0.0
    • Component/s: streams
    • Labels:
      None
    • Environment:
      one node setup, confluent kafka broker v3.2.0, kafka-clients 0.11.0.0-SNAPSHOT, 5 threads for kafka-streams

      Description

      Probably continuation of #KAFKA-5167. Portions of log:

      2017-06-07 01:17:52,435 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-5] StreamTask                 - task [2_0] Failed offset commits {browser-aggregation-KSTREAM-MAP-0000000039-repartition-0=OffsetAndMetadata{offset=4725597, metadata=''}, browser-aggregation-KSTREAM-MAP-0000000052-repartition-0=OffsetAndMetadata{offset=4968164, metadata=''}, browser-aggregation-KSTREAM-MAP-0000000026-repartition-0=OffsetAndMetadata{offset=2490506, metadata=''}, browser-aggregation-KSTREAM-MAP-0000000065-repartition-0=OffsetAndMetadata{offset=7457795, metadata=''}, browser-aggregation-KSTREAM-MAP-0000000013-repartition-0=OffsetAndMetadata{offset=530888, metadata=''}} due to Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
      2017-06-07 01:17:52,436 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask                 - task [7_0] Failed offset commits {browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085, metadata=''}} due to Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
      2017-06-07 01:17:52,488 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Failed to commit StreamTask 7_0 state: 
      org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
              at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:792)
      at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:738)
              at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:798)
              at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:778)
              at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
              at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
              at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
              at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:488)
              at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:348)
              at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
              at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208)
              at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:184)
              at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:605)
              at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1146)
              at org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:307)
              at org.apache.kafka.streams.processor.internals.StreamTask.access$000(StreamTask.java:49)
              at org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:268)
              at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:187)
              at org.apache.kafka.streams.processor.internals.StreamTask.commitImpl(StreamTask.java:259)
              at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:253)
              at org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:813)
              at org.apache.kafka.streams.processor.internals.StreamThread.access$2800(StreamThread.java:73)
              at org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:795)
              at org.apache.kafka.streams.processor.internals.StreamThread.performOnStreamTasks(StreamThread.java:1442)
              at org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:787)
              at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:776)
              at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:565)
              at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525)
      
      2017-06-07 01:17:52,747 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask                 - task [7_0] Failed offset commits {browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085, metadata=''}} due to Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.pol
      l.records.
      2017-06-07 01:17:52,776 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Failed to suspend stream task 7_0 due to: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
      2017-06-07 01:17:52,781 WARN  [73e81b0b-5801-40ab-b02d-79afede6cc6-StreamThread-2] StreamTask                 - task [6_3] Failed offset commits {browser-aggregation-Aggregate-Texts-repartition-3=OffsetAndMetadata{offset=13489738, metadata=''}} due to Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
      2017-06-07 01:17:52,781 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Failed to suspend stream task 6_3 due to: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
      2017-06-07 01:17:52,782 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] ConsumerCoordinator        - User provided listener org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener for group browser-aggregation failed on partition revocation
      org.apache.kafka.streams.errors.StreamsException: stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] failed to suspend stream tasks
              at org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:1134)
      at org.apache.kafka.streams.processor.internals.StreamThread.access$1800(StreamThread.java:73)
              at org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsRevoked(StreamThread.java:218)
              at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:422)
              at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:353)
              at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
              at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
              at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1051)
              at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1016)
              at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:580)
              at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
              at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525)
      
      //
      
      2017-06-07 01:18:15,739 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Could not create task 6_2. Will retry. org.apache.kafka.streams.errors.LockException: task [6_2] Failed to lock the state directory for task 6_2
      2017-06-07 01:18:16,741 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Could not create task 7_2. Will retry. org.apache.kafka.streams.errors.LockException: task [7_2] Failed to lock the state directory for task 7_2
      2017-06-07 01:18:17,745 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Could not create task 7_3. Will retry. org.apache.kafka.streams.errors.LockException: task [7_3] Failed to lock the state directory for task 7_3
      2017-06-07 01:18:17,795 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Still retrying to create tasks: [0_0, 1_0, 2_0, 0_2, 3_0, 0_3, 4_0, 3_1, 2_2, 5_0, 4_1, 3_2, 5_1, 6_0, 3_3, 5_2, 4_3, 6_1, 7_1, 5_3, 6_2, 7_2, 7_3]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jozi-k Jozef Koval
              • Votes:
                2 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: