Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Do
-
0.9
-
None
-
None
Description
Users find it counter-intuitive that print() on a DataSet internally calls collect() and fully materializes the set. This leads to out of memory errors on the client. It also leaves users with the feeling that Flink cannot handle large amount of data and that it fails frequently.
To improve on this situation requires some major architectural changes in Flink. The easiest solution would probably be to transfer the data from the job manager to the client via the BlobManager. Alternatively, the client could directly connect to the task managers and fetch the results.
Attachments
Issue Links
- is related to
-
FLINK-1418 Make 'print()' output on the client command line, rather than on the task manager sysout
- Closed