Details
-
Bug
-
Status: Resolved
-
P3
-
Resolution: Won't Fix
-
2.16.0
-
None
Description
I am running beam(2.16) with flink (1.8.2), in my pipeline there is a sideinput which reads from a compact kafka topic from earliest, and the sideinput value is used for filtering. I keeps on getting the OOM: GC overhead limit exceeded. attached heap dump
After some more experience, I observed following:1. run pipeline without sideinput: no OOM issue
2. run pipeline with sideinput (kafka topic with 1 partition) with data available from this side input: no OOM issue
3. run pipeline with sideinput (kafka topic with 1 partition) without any data from the side input: OOM issue
**According to the email conversation with angoenka, beam buffers the message from main input if there is no data from side input.
From Flink side:
Flink's Broadcast stream does not block or collect events from the non-broadcasted side if the broadcast side doesn't serve events.
However, the user-implemented operators (Beam or your code in this case) often puts non-broadcasted events into state to wait for input from the other side.