[KAFKA-10295] ConnectDistributedTest.test_bounce should wait for graceful stop - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Test
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.3.1, 2.5.0, 2.4.1, 2.6.0
Fix Version/s: 2.3.2, 2.6.0, 2.4.2, 2.5.1, 2.7.0
Component/s: connect
Labels:
None

Description

In ConnectDistributedTest.test_bounce, there are flakey failures that appear to follow this pattern:

The test is parameterized for hard bounces, and with Incremental Cooperative Rebalancing enabled (does not appear for protocol=eager)
A source task is on a worker that will experience a hard bounce
The source task has written records which it has not yet committed in source offsets
The worker is hard-bounced, and the source task is lost
Incremental Cooperative Rebalance starts it's scheduled.rebalance.max.delay.ms delay before recovering the task
The test ends, connectors and Connect are stopped
The test verifies that the sink connector has only written records that have been committed by the source connector
This verification fails because the source offsets are stale, and there are un-committed records in the topic, and the sink connector has written at least one of them.

This can be addressed by ensuring that the test waits for the rebalance delay to expire, and for the lost task to recover and commit offsets past the progress it made before the bounce.

Attachments

Issue Links

relates to

KAFKA-10296 Connector task reported RUNNING after hard bounce of worker

Open

links to

GitHub Pull Request #9043

Activity

People

Assignee:: Greg Harris

Reporter:: Greg Harris

Reviewer:: Randall Hauch

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 20/Jul/20 05:53

Updated:: 20/Jul/20 14:07

Resolved:: 20/Jul/20 14:07