Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4322

Cost–benefit of compression HBase result

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      kylin.storage.hbase.endpoint-compress-result is  TRUE as default.

      In our production environment, when the hbase scan result is larger than 200M, it will take more than 10s to compress data.

      We can find this by hbase's log:

      Size avg rate min rate avg time max time
      <1M 0.12 0.25 0.18ms 0.7s
      1M ~ 10M 0.39 0.97 0.2s 0.6s
      10M ~ 100M 0.47 0.81 2s 6.3s
      >100M 0.95 0.96 15.7s 24.8s

      Notice:

      1. rate: compressed data size / origin data size
      2. when the source data size is < 1M, compressed data may larger than the source data. So the table(Row 1) only calculate then compressed data less than the source data
      3. In our environment, 65% compression data (<1M) is larger than source data 

      When source data is less then 10M, the latency of data transmission is acceptability. When data is larger then 100M, it will take a long time to compress data.

       

      So, I think kylin.storage.hbase.endpoint-compress-result  should be FALSE by default;

       

      Attachments

        Issue Links

          Activity

            People

              zhoukangcn ZhouKang
              zhoukangcn ZhouKang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: