Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-3385

Fix outer join skipping unprobed partitions

Agile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersStop watchingWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 1.0.0
    • Runtime / Coordination
    • None

    Description

      MutableHashTable.nextRecord performs three steps for a build-side outer join:

      	public boolean nextRecord() throws IOException {
      		if (buildSideOuterJoin) {
      			return processProbeIter() || processUnmatchedBuildIter() || prepareNextPartition();
      		} else {
      			return processProbeIter() || prepareNextPartition();
      		}
      	}
      

      MutableHashTable.processUnmatchedBuildIter eventually calls through to MutableHashTable.moveToNextBucket which is unable to process spilled partitions:

      			if (p.isInMemory()) {
      				...
      			} else {
      				return false;
      			}
      

      MutableHashTable.prepareNextPartition calls HashPartition.finalizeProbePhase which only spills the partition (to be read and processed in the next instantiation of MutableHashTable) if probe-side records were spilled. In an equi-join this is fine but with an outer join the unmatched build-side records must still be retained (though no further probing is necessary, so could this be short-circuited when loaded by the next MutableHashTable?).

      		if (isInMemory()) {
      			...
      		}
      		else if (this.probeSideRecordCounter == 0) {
      			// partition is empty, no spilled buffers
      			// return the memory buffer
      			freeMemory.add(this.probeSideBuffer.getCurrentSegment());
      
      			// delete the spill files
      			this.probeSideChannel.close();
      			this.buildSideChannel.deleteChannel();
      			this.probeSideChannel.deleteChannel();
      			return 0;
      		}
      		else {
      			// flush the last probe side buffer and register this partition as pending
      			this.probeSideBuffer.close();
      			this.probeSideChannel.close();
      			spilledPartitions.add(this);
      			return 1;
      		}
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            greghogan Greg Hogan
            greghogan Greg Hogan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment