Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.13.1
Description
We should clarify the contract of SourceFunction#cancel()
- source itself shouldn’t be interrupting the source thread
- interrupt shouldn’t be expected in the clean cancellation case
Interrupting the code on the clean shutdown path can cause failures when doing `stop-with-savepoint`. If source thread is interrupted during backpressure, this leaves network stack in invalid state, making it impossible to send EndOfPartitionEvent to complete the shutdown.
In a bit more detail, when source thread is backpressured, network stack might have already sent a partial record and it could be waiting for a buffer to finish writing/serialising that record. If network stack is interrupted while waiting for that buffer, it will never resume writing/serialisation of the remaining part of that record, while downstream node will be expecting those bytes. If in this situation we attempt to emit anything (like EndOfPartitionEvent), this will most likely cause deserialisation errors on the downstream nodes.
Attachments
Attachments
Issue Links
- is related to
-
FLINK-21028 Streaming application didn't stop properly
- Closed
-
FLINK-23528 stop-with-savepoint can fail with FlinkKinesisConsumer
- Closed
- relates to
-
FLINK-28758 FlinkKafkaConsumer fails to stop with savepoint
- Closed
-
FLINK-24182 Tasks canceler should not immediately interrupt
- Closed
- links to