[KAFKA-9368] Preserve stream-time across rebalances/restarts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: streams
Labels:
None

Description

Stream-time is used to make decisions about processing out-of-order records or drop them if they are late (ie, timestamp < stream-time - grace-period). This is currently tracked on a per-processor basis such that each node has its own local view of stream-time based on the maximum timestamp it has processed.

During rebalances and restarts, stream-time is initialized as UNKNOWN (ie, -1) for all processors in tasks that are newly created (or migrated). In net effect, we forget current stream-time for this case what may lead to non-deterministic behavior if we stop processing right before a late record, that would be dropped if we continue processing, but is not dropped after rebalance/restart. Let's look at an examples with a grace period of 5ms for a tumbling windowed of 5ms, and the following records (timestamps in parenthesis):

r1(0) r2(5) r3(11) r4(2)

In the example, stream-time advances as 0, 5, 11, 11 and thus record `r4` is dropped as late because 2 < 6 = 11 - 5. However, if we stop processing or rebalance after processing `r3` but before processing `r4`, we would reinitialize stream-time as -1, and thus would process `r4` on restart/after rebalance. The problem is, that stream-time does advance differently from a global point of view: 0, 5, 11, 2.

Of course, this is a corner case because if we would stop processing one record earlier – ie, after processing `r2` but before processing `r3` – stream-time would be advanced correctly from a global point of view.

Note that in previous versions the maximum partition-time was actually used for stream-time. This changed in 2.3 due to ~~KAFKA-7895~~/PR 6278, and could potentially change yet again in future versions (c.f. KAFKA-8769). Partition-time actually is preserved as of 2.4 thanks to ~~KAFKA-7994~~.

Attachments

Issue Links

is related to

KAFKA-7994 Improve Partition-Time for rebalances and restarts

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: A. Sophie Blee-Goldman

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 06/Jan/20 19:24

Updated:: 15/May/20 05:21