Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9819

reduceBy(KeyAnd)Window should specify which is the accumulator argument in invReduceFunc

    XMLWordPrintableJSON

    Details

      Description

      reduceByWindow has an optional invReduceFunc argument which allows the reduction to be performed incrementally.

      The incremental reduction performed in ReducedWindowedDStream only depends on the reduction being associative (as shown by the reduce applied to oldValues), but does not require those functions to be commutative.

      In particular, if the inverse reduction is the non-commutative, non-associative substraction (e.g. what you're computing is a running sum), it's necessary to know that the intermediate result (to be substracted from) is the first argument of invReduceFunc and that the second argument is the old value to substract.

      It's only in the commutative case that we don't care which is which.

      The Scaladoc for the various overloads of reduceByWindow should let the user know which is the accumulator, and which is the old value. A concise, unambiguous way to state this is to write an inversion law in the Scaladoc:

      invReduceFunc(reduceFunc(x, y), y) = x

        Attachments

          Activity

            People

            • Assignee:
              huitseeker Fran├žois Garillot
              Reporter:
              huitseeker Fran├žois Garillot
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: