Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10644

Batch Job: Speculative execution

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Runtime / Coordination
    • Labels:
      None

      Description

      Strugglers/outlier are tasks that run slower than most of the all tasks in a Batch Job, this somehow impact job latency, as pretty much this straggler will be in the critical path of the job and become as the bottleneck.

      Tasks may be slow for various reasons, including hardware degradation, or software mis-configuration, or noise neighboring. It's hard for JM to predict the runtime.

      To reduce the overhead of strugglers, other system such as Hadoop/Tez, Spark has speculative execution. Speculative execution is a health-check procedure that checks for tasks to be speculated, i.e. running slower in a ExecutionJobVertex than the median of all successfully completed tasks in that EJV, Such slow tasks will be re-submitted to another TM. It will not stop the slow tasks, but run a new copy in parallel. And will kill the others if one of them complete.

      This JIRA is an umbrella to apply this kind of idea in FLINK. Details will be append later.

        Attachments

          Activity

            People

            • Assignee:
              eaglewatcher BoWang
              Reporter:
              isunjin JIN SUN
            • Votes:
              4 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h