[FLINK-9031] DataSet Job result changes when adding rebalance after union - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.3.1
Fix Version/s: 1.3.4, 1.4.3, 1.5.0
Component/s: API / DataSet, Runtime / Task
Labels:
None

Description

A user reported this issue on the user mailing list.

I am using Flink 1.3.1 and I have found a strange behavior on running the following logic:

Read data from file and store into DataSet<POJO>

Split dataset in two, by checking if "field1" of POJOs is empty or not, so that the first dataset contains only elements with non empty "field1", and the second dataset will contain the other elements.

Each dataset is then grouped by, one by "field1" and other by another field, and subsequently reduced.

The 2 datasets are merged together by union.

The final dataset is written as json.

What I was expected, from output, was to find only one element with a specific value of "field1" because:

Reducing the first dataset grouped by "field1" should generate only one element with a specific value of "field1".

The second dataset should contain only elements with empty "field1".

Making an union of them should not duplicate any record.

This does not happen. When i read the generated jsons i see some duplicate (non empty) values of "field1".
Strangely this does not happen when the union between the two datasets is not computed. In this case the first dataset produces elements only with distinct values of "field1", while second dataset produces only records with empty field "value1".

The user has not enable object reuse.

Later he reports that the problem disappears when he injects a rebalance() after a union resolves the problem. I had a look at the execution plans for both cases (attached to this issue) but could not identify a problem.

Hence I assume, this might be an issue with the runtime code but we need to look deeper into this. The user also provided an example program consisting of two classes which are attached to the issue as well.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

oldplan.txt
20/Mar/18 14:32
114 kB
Fabian Hueske
newplan.txt
20/Mar/18 14:32
114 kB
Fabian Hueske
Person.java
20/Mar/18 14:33
0.9 kB
Fabian Hueske
RunAll.java
20/Mar/18 14:33
5 kB
Fabian Hueske

Issue Links

links to

GitHub Pull Request #5742

Activity

People

Assignee:: Fabian Hueske

Reporter:: Fabian Hueske

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 20/Mar/18 14:33

Updated:: 02/Oct/19 17:43

Resolved:: 04/Apr/18 16:07