Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2239

print() on DataSet: stream results and print incrementally

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Do
    • 0.9
    • None
    • Runtime / Coordination
    • None

    Description

      Users find it counter-intuitive that print() on a DataSet internally calls collect() and fully materializes the set. This leads to out of memory errors on the client. It also leaves users with the feeling that Flink cannot handle large amount of data and that it fails frequently.

      To improve on this situation requires some major architectural changes in Flink. The easiest solution would probably be to transfer the data from the job manager to the client via the BlobManager. Alternatively, the client could directly connect to the task managers and fetch the results.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mxm Maximilian Michels
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: