Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-1237

Revisit on cube size estimation

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v1.4.0, v1.5.0
    • Fix Version/s: v1.4.0, v1.5.0
    • Component/s: None
    • Labels:
      None

      Description

      currently CreateHTableJob.estimateCuboidStorageSize does not consider hbase encoding and compression into consideration. From our observation in real cubes, the estimation can be tens of times bigger than actual:

      here's some stats:

      Cube1(w/o hll, holistic distinct count)
      1051G=>161G (estimated size=>real size)

      cube2(w/o hll)
      2118G => 504G

      cube3(w/o hll)
      3507G=>791G

      cube 4(w 2 hll15)
      188T => 2T

      cube 5(w 2 hll15)
      28T => 0.7T

      cube 6(w 1 hll16)
      172G=>30G

      from the stats we can see that for cubes without hll, the estimation can be 4~5 times bigger, for cubes with hll, the estimation can be more than 50 times!(It's worth studying why cube6 is estimated only 6 time bigger, maybe related to hll precision level, maybe due to data?)

      To reduce region counts, we will apply estimation discount as follows:

      if (isMemoryHungry)

      { logger.info("Cube is memory hungry, storage size multiply 0.05"); ret *= 0.05; }

      else

      { logger.info("Cube is not memory hungry, storage size multiply 0.25"); ret *= 0.25; }

      and let's see how it works

        Attachments

          Activity

            People

            • Assignee:
              mahongbin Hongbin Ma
              Reporter:
              mahongbin Hongbin Ma
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: