Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
The method putRowKeyToHLL() in FactDistinctColumnsMapper can be very slow when sampling rate is high. After carefully profiling, we believe that it's performance can be improved by modifying it's hash method. At the same time, we also found an algorithm that can estimate the row nums of each cuboid accurately with a lower sampling rate. I will share more test results and details of the algorithm once after this issue is done.
Attachments
There are no Sub-Tasks for this issue.