Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4256 Fine-grained recovery
  3. FLINK-10205

Batch Job: InputSplit Fault tolerant for DataSourceTask

    XMLWordPrintableJSON

Details

    Description

      Today DataSource Task pull InputSplits from JobManager to achieve better performance, however, when a DataSourceTask failed and rerun, it will not get the same splits as its previous version. this will introduce inconsistent result or even data corruption.

      Furthermore,  if there are two executions run at the same time (in batch scenario), this two executions should process same splits.

      we need to fix the issue to make the inputs of a DataSourceTask deterministic. The propose is save all splits into ExecutionVertex and DataSourceTask will pull split from there.

       document:

      https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing

      Attachments

        Issue Links

          Activity

            People

              Ryantaocer ryantaocer
              isunjin JIN SUN
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 167.5h
                  167.5h
                  Logged:
                  Remaining Estimate - 167.5h
                  0.5h