[SPARK-34187] Use available offset range obtained during polling when checking offset validation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.7, 3.0.1, 3.1.0
Fix Version/s: 2.4.8, 3.0.2, 3.1.1
Component/s: Structured Streaming
Labels:
- correctness

Description

We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, we do offset validation by checking if the offset is in available offset range. But currently we obtain latest available offset range to do the check. It looks not correct as the available offset range could be changed during the batch, so the available offset range is different than the one when we polling the records from Kafka.

It is possible that an offset is valid when polling, but at the time we do the above check, it is out of latest available offset range. We will wrongly consider it as data loss case and fail the query or drop the record.

Attachments

Issue Links

links to

[Github] Pull Request #31275 (viirya)

[Github] Pull Request #31328 (viirya)

[Github] Pull Request #31330 (viirya)

Activity

People

Assignee:: L. C. Hsieh

Reporter:: L. C. Hsieh

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Jan/21 07:24

Updated:: 08/Feb/21 13:54

Resolved:: 24/Jan/21 19:52