Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.2.0
-
None
Description
The NormalizedKeySorter sorts on the concatenation of (potentially partial) keys plus an 8-byte pointer to the record. After sorting each pointer must be dereferenced, which is not cache friendly.
The FixedLengthRecordSorter sorts on the concatentation of full keys followed by the remainder of the record. The records can then be deserialized in sequence.
Instrumenting the FixedLengthRecordSorter requires implementing the comparator methods writereadWithKeyNormalization and readWithKeyNormalization.
Testing JaccardIndex on an m4.16xlarge the scale 18 runtime dropped from 71.8 to 68.8 s (4.3% faster) and the scale 20 runtime dropped from 546.1 to 501.8 s (8.8% faster).