Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-24279

Support withBroadcast with DataStream API in Flink ML Library

    XMLWordPrintableJSON

Details

    Description

      When doing machine learning using DataStream, we found that DataStream lacks withBroadcast() function, which could be useful in machine learning.

       

      A DataSet-based demo is like:

      DataSet<?> d1 = ...;
      DataSet<?> d2 = ...;
      d1.map(new RichMapFunction <?, ?>() {
             @Override
             public Object map(Object aLong) throws Exception{
                  List<?> elements = getRuntimeContext().getBroadcastVariable("d2");
                  ...;           
             }
      }).withBroadcastSet(d2, "d2");
      

       

      The withBroadcast() function incurs priority-base data-consuming. For example in the above code snippet, we cannot consume any element from d1 before we consumed all of elements in d2.
       
      Thus when supporting withBroadcast() in DataStream, we also need priority-base data-consuming. This could probably lead to deadlock and DataStream does not provide a solution for deadlock.
       

      Attachments

        Activity

          People

            ghandzhipeng Zhipeng Zhang
            ghandzhipeng Zhipeng Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: