Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12816

Inconsistent behaviour of triggers and/or that requires clarification

Details

    • Bug
    • Status: Resolved
    • P3
    • Resolution: Won't Fix
    • 2.31.0
    • Missing
    • runner-py-direct
    • None
    • macOS 11.5.1, Python 3.7.10

    Description

      Hi,

      I've been using the Direct Runner in Python recently, as part of tests for a job aimed to be run on Dataflow. It's been a bit hard to know exactly what triggers do, but resorting to simple test cases, it seems like I've noticed some inconsistencies. I haven't checked whether those appear when running in Dataflow, though.

       

      1. AfterAny(AfterProcessingTime(XX)) acts as Repeatedly(AfterProcessingTime(XX))

      I would have expected AfterAny(SomeTrigger) <=> SomeTrigger, i.e. AfterAny acting as the identity when provided a single trigger. However this is not the case for AfterProcessingTime, because its on_fire result is not forwarded by AfterAny, due to the call to should_fire not passing the right time domain. This means that on_fire always returns False, hence AfterAny acting as Repeatedly in this case.

      Was this the purpose of saving the time domain from the should_fire call, which does not seem to be used?

      2. AfterAny(AfterCount, AfterProcessingTime) triggers both children when AfterCount triggers first

      This is less of a problem, but AfterProcessingTime will still trigger after AfterCount has triggered, when combined into an AfterAny. If no element has been added to the window, this means an empty pane will be emitted (or identical to the previous one, depending on the accumulation mode). My guess is that this happens because AfterProcessingTime does not implement reset(), but I guess there's a reason for this?

      3. Unsafe trigger warning

      It is unclear to me how the {{DataLossReason}}s are combined. For example:

      Repeatedly(AfterAny(AfterCount, AfterProcessingTime))

      is detected as an unsafe trigger, but

      Repeatedly(AfterAny(AfterCount, Repeatedly(AfterProcessingTime)))

      isn't, although if I'm not mistaken they should basically provide the same behaviour.

       

      Happy to provide more details if needed, and sorry if the issue doesn't quite fit the template you're expecting.

      Thanks,

      Flo

      Attachments

        Activity

          People

            Unassigned Unassigned
            flovouin Flo Vouin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: