Details
-
Sub-task
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
Description
DataStream API provides the `cache` method to cache the result of a DataStream and reuse it in later jobs with batch execution mode.
I think we should verify:
- Follow the doc to write a Flink job that produces cache and a job that consumes cache and submit it to a session cluster(standalone or yarn).
- You can remove the source physically after the cache-producing job is finished to verify that the cache-consuming job is not reading from the source. For example, delete the file in the filesystem if you are using a file source.
- You can restart the TaskManager after the cache-producing job is finished to verify that the cache-consuming job will re-compute the result.