Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9914

Cache Main input while no data from side input could cause OOM


    • Bug
    • Status: Resolved
    • P3
    • Resolution: Won't Fix
    • 2.16.0
    • Missing
    • beam-model
    • None


      I am running beam(2.16)  with flink (1.8.2), in my pipeline there is a sideinput which reads from a compact kafka topic from earliest, and the sideinput value is used for filtering. I keeps on getting the OOM: GC overhead limit exceeded. attached heap dump
      After some more experience, I observed following:1. run pipeline without sideinput: no OOM issue
      2. run pipeline with sideinput (kafka topic with 1 partition) with data available from this side input: no OOM issue
      3. run pipeline with sideinput (kafka topic with 1 partition) without any data from the side input: OOM issue
      **According to the email conversation with angoenka, beam buffers the message from main input if there is no data from side input. 
      From Flink side:
      Flink's Broadcast stream does not block or collect events from the non-broadcasted side if the broadcast side doesn't serve events.
      However, the user-implemented operators (Beam or your code in this case) often puts non-broadcasted events into state to wait for input from the other side.


        1. image (10).png
          411 kB
          Eleanore Jin



            Unassigned Unassigned
            eleanore0102 Eleanore Jin
            0 Vote for this issue
            6 Start watching this issue