Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2883

Add documentation to forbid key-modifying ReduceFunction

    Details

      Description

      If one uses a combinable reduce operation which also changes the key value of the underlying data element, then the results of the reduce operation can become wrong. The reason is that after the combine phase, another reduce operator is executed which will then reduce the elements based on the new key values. This might be not so surprising if one explicitly defined ones GroupReduceOperation as combinable. However, the ReduceFunction conceals the fact that a combiner is used implicitly. Furthermore, the API does not prevent the user from changing the key fields which could solve the problem.

      The following example program illustrates the problem

      val env = ExecutionEnvironment.getExecutionEnvironment
      env.setParallelism(1)
      
      val input = env.fromElements((1,2), (1,3), (2,3), (3,3), (3,4))
      
      val result = input.groupBy(0).reduce{
        (left, right) =>
          (left._1 + right._1, left._2 + right._2)
      }
      
      result.output(new PrintingOutputFormat[Int]())
      
      env.execute()
      

      The expected output is

      (2, 5)
      (2, 3)
      (6, 7)
      

      However, the actual output is

      (4, 8)
      (6, 7)
      

      I think that the underlying problem is that associativity and commutativity is not sufficient for a combinable reduce operation. Additionally we also need to make sure that the key stays the same.

        Issue Links

          Activity

          Hide
          aljoscha Aljoscha Krettek added a comment -

          I don't think this can be solved in our API because we don't have a separation between what is key and what is not key in the user data-type. (Spark, for example can do it because they have a clear key/value model). The users just have to follow the rules if they want correct results.

          Show
          aljoscha Aljoscha Krettek added a comment - I don't think this can be solved in our API because we don't have a separation between what is key and what is not key in the user data-type. (Spark, for example can do it because they have a clear key/value model). The users just have to follow the rules if they want correct results.
          Hide
          fhueske Fabian Hueske added a comment -

          I agree with Aljoscha Krettek. However, we need to make sure the rules are documented are easy to understand.

          Show
          fhueske Fabian Hueske added a comment - I agree with Aljoscha Krettek . However, we need to make sure the rules are documented are easy to understand.
          Hide
          till.rohrmann Till Rohrmann added a comment -

          Yes, documenting it sounds like a reasonable solution. This is probably also more of a corner case.

          Show
          till.rohrmann Till Rohrmann added a comment - Yes, documenting it sounds like a reasonable solution. This is probably also more of a corner case.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user greghogan opened a pull request:

          https://github.com/apache/flink/pull/3256

          FLINK-2883 [docs] Add documentation to forbid key-modifying ReduceFunction

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/greghogan/flink 2883_add_documentation_to_forbid_keymodifying_reducefunction

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3256.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3256


          commit 2a858819e2163a4c935521670d766ebaaba5b99d
          Author: Greg Hogan <code@greghogan.com>
          Date: 2017-02-02T21:15:52Z

          FLINK-2883 [docs] Add documentation to forbid key-modifying ReduceFunction


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/3256 FLINK-2883 [docs] Add documentation to forbid key-modifying ReduceFunction You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 2883_add_documentation_to_forbid_keymodifying_reducefunction Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3256.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3256 commit 2a858819e2163a4c935521670d766ebaaba5b99d Author: Greg Hogan <code@greghogan.com> Date: 2017-02-02T21:15:52Z FLINK-2883 [docs] Add documentation to forbid key-modifying ReduceFunction
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3256

          I think this is important to add.

          +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3256 I think this is important to add. +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user greghogan closed the pull request at:

          https://github.com/apache/flink/pull/3256

          Show
          githubbot ASF GitHub Bot added a comment - Github user greghogan closed the pull request at: https://github.com/apache/flink/pull/3256
          Hide
          greghogan Greg Hogan added a comment -

          1.2: 37d6f8196f12714491c95adb73319bb36e7a0d34
          master: 11ebf484280314231d146dcfb0b973934448f00b

          Show
          greghogan Greg Hogan added a comment - 1.2: 37d6f8196f12714491c95adb73319bb36e7a0d34 master: 11ebf484280314231d146dcfb0b973934448f00b

            People

            • Assignee:
              greghogan Greg Hogan
              Reporter:
              till.rohrmann Till Rohrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development