Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-2801

Support excluding tasks from watermark computation when exceeding idle time

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Currently in the Samza event-time watermark aggregation logic, it will compute the watermark as the min of watermarks from all upstream tasks. However, in the lagging cases, the upstream task might not generate watermark for a long period. In this case, the new watermarks will not be generated and the downstream aggregation will be stuck.

      To address this issue, we will implement an mechanism to exclude the tasks that have been "idle" in generating watermark for a configured time, so that the aggregated watermarks will still be generated. Note this mechanism will unblock downstream, but also at the risk of moving eventtime clock faster and the events from lagging tasks will become late arrivals.

      Attachments

        Activity

          People

            xinyu Xinyu Liu
            xinyu Xinyu Liu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 50m
                2h 50m