Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-1267

Add crossGroup operator

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Reopened
    • Not a Priority
    • Resolution: Unresolved
    • 0.7.0-incubating
    • None

    Description

      A common operator is to pair-wise compare or combine all elements of a group (there were two questions about this on the user mailing list, recently). Right now, this can be done in two ways:

      1. groupReduce: consume and store the complete iterator in memory and build all pairs
      2. do a self-Join: the engine builds all pairs of the full symmetric Cartesian product.

      Both approaches have drawbacks. The groupReduce variant requires that the full group fits into memory and is more cumbersome to implement for the user, but pairs can be arbitrarily built. The self-Join approach pushes most of the work into the system, but the execution strategy does not treat the self-join different from a regular join (both identical inputs are shuffled, etc.) and always builds the full symmetric Cartesian product.

      I propose to add a dedicated crossGroup() operator, that offers this functionality in a proper way.

      Attachments

        Issue Links

          Activity

            People

              pietropinoli pietro pinoli
              fhueske Fabian Hueske
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: