Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2390

Replace iteration timeout with algorithm for detecting termination

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • API / DataStream
    • None

    Description

      Currently the user can set a timeout which will shut down the iteration source/sink nodes if no new data is received during that time to allow program termination in iterative streaming jobs.

      This method is used due to the non-trivial nature of termination in iterative streaming jobs. While termination is not a main concern in long running streaming jobs, this behaviour makes iterative tests non-deterministic and they often fail on travis due to the timeout. Also setting a timeout can cause jobs to terminate prematurely.

      I propose to remove iteration timeouts and replace it with the following algorithm for detecting termination:

      -We first identify loop edges in the jobgraph (the channels from the iteration sources to the head operators)
      -Once the head operators (the ones with loop input) finish with all their non-loop inputs they broadcast a marker to their outputs.
      -Each operator will broadcast a marker once it received a marker from all its non-finished inputs
      -Iteration sources are terminated when they receive 2 consecutive markers without receiving any record in-between

      The idea behind the algorithm is to find out when no more outputs are generated from the operators inside an iteration after their normal inputs are finished.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gyfora Gyula Fora
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: