Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11906

No trigger early repeatedly for session windows

Details

    • Bug
    • Status: Open
    • P1
    • Resolution: Unresolved
    • 2.23.0, 2.28.0
    • None
    • runner-dataflow
    • None

    Description

      Originated from: https://stackoverflow.com/questions/66381608/apache-beam-does-not-trigger-early-repeatedly-for-session-windows-on-google-data

      The following pipeline fires early after each element when running locally using DirectRunner, but there are no early triggers when running on google cloud dataflow. On dataflow it triggers only after the session window has closed.

      ( p
              | 'read'   >> beam.io.ReadFromPubSub(subscription = 'projects/xxx/subscriptions/xxx-sub')
              | 'json'   >> beam.Map(lambda x: json.loads(x.decode('utf-8')))
              | 'kv'     >> beam.Map(lambda x: (x['id'], x['amount']))
              | 'window' >> beam.WindowInto(window.Sessions(15*60), trigger=trigger.Repeatedly(trigger.AfterCount(1)), accumulation_mode=AccumulationMode.ACCUMULATING)
              | 'group'  >> beam.GroupByKey()
              | 'log'    >> beam.Map(lambda x: logging.info(x))
      )
      

      Apache Beam versions tried: 2.23 and 2.28.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ningk Ning
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: