Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
When doing machine learning using DataStream, we found that DataStream lacks withBroadcast() function, which could be useful in machine learning.
A DataSet-based demo is like:
DataSet<?> d1 = ...; DataSet<?> d2 = ...; d1.map(new RichMapFunction <?, ?>() { @Override public Object map(Object aLong) throws Exception{ List<?> elements = getRuntimeContext().getBroadcastVariable("d2"); ...; } }).withBroadcastSet(d2, "d2");
The withBroadcast() function incurs priority-base data-consuming. For example in the above code snippet, we cannot consume any element from d1 before we consumed all of elements in d2.
Thus when supporting withBroadcast() in DataStream, we also need priority-base data-consuming. This could probably lead to deadlock and DataStream does not provide a solution for deadlock.