The new ResultPartitionType : BLOCKING_PERSISTENT is similar to BLOCKING except it might be consumed for several times and will be released after TM shutdown or ResultPartition removal request.
This is the basis for Interactive Programming.
Here is the brief changes:
- Introduce ResultPartitionType called BLOCKING_PERSISTENT
- Introduce BlockingShuffleOutputFormat which contains a user specified IntermediateDataSetID(passed from TableAPI in later PR)
- when JobGraphGenerator sees a GenericDataSinkBase with BlockingShuffleOutputFormat, it creates a IntermediateDataSet with this id, then add it to its predecessor, the OutputFormatVertex for this GenericDataSinkBase will be excluded in JobGraph
- So the JobGraph may contains some JobVertex which has more IntermediateDataSet than its downstream consumers.
Here are some design notes:
- Why modify DataSet and JobGraphGenerator
Since Blink Planner is not ready yet, and Batch Table is running on Flink Planner(based on DataSet).
There will be another implementation once Blink Planner is ready.
- Why use a special OutputFormat as placeholder
We could add a cache() method for DataSet, but we do not want to change DataSet API any more. so a special OutputFormat as placeholder seems reasonable.