Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-18830

JoinCoGroupFunction and FlatJoinCoGroupFunction work incorrectly for outer join when one side of coGroup is empty

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 1.11.1
    • None
    • API / DataStream
    • None

    Description

      Currently, The JoinCoGroupFunction and FlatJoinCoGroupFunction in JoinedStreams doesn't respect the join type, it's been implemented as doing join within a two-level loop. However, this is incorrect for outer join when one side of the coGroup is empty.

      	public void coGroup(Iterable<T1> first, Iterable<T2> second, Collector<T> out) throws Exception {
      			for (T1 val1: first) {
      				for (T2 val2: second) {
      					wrappedFunction.join(val1, val2, out);
      				}
      			}
      		}
      

      The above code is the current implementation, suppose the first input is non-empty, and the second input is an empty iterator, then the join function(`wrappedFunction`) will never be called. This will cause no data to be emitted for a left outer join.

      So I propose to consider join type here, and handle this case, e.g., for left outer join, we can emit record with right side set to null here if the right side is empty or can not find any match in the right side.

      Attachments

        Activity

          People

            Unassigned Unassigned
            liupengcheng liupengcheng
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: