Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4256 Fine-grained recovery
  3. FLINK-10205

Batch Job: InputSplit Fault tolerant for DataSourceTask

    XMLWordPrintableJSON

    Details

      Description

      Today DataSource Task pull InputSplits from JobManager to achieve better performance, however, when a DataSourceTask failed and rerun, it will not get the same splits as its previous version. this will introduce inconsistent result or even data corruption.

      Furthermore,  if there are two executions run at the same time (in batch scenario), this two executions should process same splits.

      we need to fix the issue to make the inputs of a DataSourceTask deterministic. The propose is save all splits into ExecutionVertex and DataSourceTask will pull split from there.

       document:

      https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Ryantaocer ryantaocer
                Reporter:
                isunjin JIN SUN
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 167.5h
                  167.5h
                  Logged:
                  Remaining Estimate - 167.5h
                  0.5h