Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-2518

Improve the sampling performance of FactDistinctColumns step

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: v2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      The method putRowKeyToHLL() in FactDistinctColumnsMapper can be very slow when sampling rate is high. After carefully profiling, we believe that it's performance can be improved by modifying it's hash method. At the same time, we also found an algorithm that can estimate the row nums of each cuboid accurately with a lower sampling rate. I will share more test results and details of the algorithm once after this issue is done.

        Attachments

          Activity

            People

            • Assignee:
              xiefan46 XIE FAN
              Reporter:
              xiefan46 XIE FAN
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: