Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-2518

Improve the sampling performance of FactDistinctColumns step

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v2.0.0
    • None
    • None

    Description

      The method putRowKeyToHLL() in FactDistinctColumnsMapper can be very slow when sampling rate is high. After carefully profiling, we believe that it's performance can be improved by modifying it's hash method. At the same time, we also found an algorithm that can estimate the row nums of each cuboid accurately with a lower sampling rate. I will share more test results and details of the algorithm once after this issue is done.

      Attachments

        There are no Sub-Tasks for this issue.

        Activity

          People

            xiefan46 XIE FAN
            xiefan46 XIE FAN
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: