Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-4860 Sort performance
  3. FLINK-4705

Instrument FixedLengthRecordSorter

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The NormalizedKeySorter sorts on the concatenation of (potentially partial) keys plus an 8-byte pointer to the record. After sorting each pointer must be dereferenced, which is not cache friendly.

      The FixedLengthRecordSorter sorts on the concatentation of full keys followed by the remainder of the record. The records can then be deserialized in sequence.

      Instrumenting the FixedLengthRecordSorter requires implementing the comparator methods writereadWithKeyNormalization and readWithKeyNormalization.

      Testing JaccardIndex on an m4.16xlarge the scale 18 runtime dropped from 71.8 to 68.8 s (4.3% faster) and the scale 20 runtime dropped from 546.1 to 501.8 s (8.8% faster).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            greghogan Greg Hogan

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m

                Slack

                  Issue deployment