Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-2303

Exclude side inputs when handling end-of-stream and watermarks for high-level

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.7
    • None
    • None

    Description

      OperatorImplGraph builds EndOfStreamStates and WatermarkStates objects with all of the input SSPs from the job model. That includes side-input SSPs. However, high-level operator tasks aren't given messages from side-input SSPs, so high-level operators should not need to include handling for end-of-stream and watermarks.

      The result of this issue is that end-of-stream and watermark handling tries to include side-inputs but never updates those states, which can result in not exiting properly (end-of-stream) and not correctly calculating watermarks.

      We currently have tests which use partitionBy and side-inputs, but they only use a single partition, so RunLoop is able to shutdown the task (RunLoop doesn't check side inputs when determining if the task is at the end of all streams). Normally, OperatorImpl will shut down the task when using high-level, and I think changing OperatorImpl to do ignore side input SSPs so that it does shut down the task is the fix.

      Attachments

        Activity

          People

            cameronlee Cameron Lee
            cameronlee Cameron Lee
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h