Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-3829

MergeJoinEnumerator should not use inputs enumerators until it is really required

Rank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.21.0
    • 1.23.0
    • core

    Description

      EnumerableDefaults#MergeJoinEnumerator provides an Enumerator that performs a merge join between two sorted inputs. This sort operation can be potentially very expensive, so we should skip it if possible. Right now, merge join inputs' enumerators are created when MergeJoinEnumerator is constructed; however, there are some cases where we can skip the enumerator creation of one input: if the outer (i.e. left) enumerator returns no results, there is no need to access (and sort) the inner (i.e. right) enumerator. For this reason, we should delay the inner enumerator creation until the moment we are sure it is really required: when the first element of the outer enumerator is fetched. This strategy is already in place in other join algorithms in EnumerableDefaults (e.g. nestedLoopJoinOptimized, semiEquiJoin), and it will be quite easy to apply on MergeJoinEnumerator.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rubenql Ruben Q L
            rubenql Ruben Q L
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                Slack

                  Issue deployment