Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9914

Cache Main input while no data from side input could cause OOM

Details

    • Bug
    • Status: Resolved
    • P3
    • Resolution: Won't Fix
    • 2.16.0
    • Missing
    • beam-model
    • None

    Description

      I am running beam(2.16)  with flink (1.8.2), in my pipeline there is a sideinput which reads from a compact kafka topic from earliest, and the sideinput value is used for filtering. I keeps on getting the OOM: GC overhead limit exceeded. attached heap dump
       
      After some more experience, I observed following:1. run pipeline without sideinput: no OOM issue
      2. run pipeline with sideinput (kafka topic with 1 partition) with data available from this side input: no OOM issue
      3. run pipeline with sideinput (kafka topic with 1 partition) without any data from the side input: OOM issue
       
      **According to the email conversation with angoenka, beam buffers the message from main input if there is no data from side input. 
       
      From Flink side:
      Flink's Broadcast stream does not block or collect events from the non-broadcasted side if the broadcast side doesn't serve events.
      However, the user-implemented operators (Beam or your code in this case) often puts non-broadcasted events into state to wait for input from the other side.

      Attachments

        1. image (10).png
          411 kB
          Eleanore Jin

        Activity

          People

            Unassigned Unassigned
            eleanore0102 Eleanore Jin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: