Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6394

GroupCombine reuses instances even though object reuse is disabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.2.0
    • 1.2.2, 1.3.0
    • API / DataSet
    • None

    Description

      I am using group combiner in DataSet API with disabled object reuse.

      In code it may be expressed as follows:

      tuples.groupBy(1)
            .combineGroup((it, collector) -> {
               // store first item for future use
               Pojo stored = it.next();
               while (it.hasNext()) {
                 ....
               }
            })
      

      It seems even the object reuse feature is disabled, my instance is actually replaced when .next() is called on the iterator. It leads to very confusing and wrong results.

      I checked the Flink codebase and it seems CombiningUnilateralSortMerger is actually reusing object instances even though object reuse is explicitly disabled.
      In spilling phase user's combiner is called with instance of CombineValueIterator that actually reuses instances without any warning.

      See https://github.com/apache/flink/blob/d7b59d761601baba6765bb4fc407bcd9fd6a9387/flink-runtime/src/main/java/org/apache/flink/runtime/operators/sort/CombiningUnilateralSortMerger.java#L550

      When I disable combiner and use groupReduce only with the same reduce function, results are fine.

      Please let me know if you can confirm this as a bug. From my point of view it's highly critical as I am getting unpredictable results.

      Attachments

        Issue Links

          Activity

            People

              ykt836 Kurt Young
              vanekjar Jaromir Vanek
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: