Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14468

Compaction improvements: FIFO compaction policy

    XMLWordPrintableJSON

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      FIFO compaction policy selects only files which have all cells expired. The column family MUST have non-default TTL.
      Essentially, FIFO compactor does only one job: collects expired store files.

      Because we do not do any real compaction, we do not use CPU and IO (disk and network), we do not evict hot data from a block cache. The result: improved throughput and latency both write and read.
      See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
      Show
      FIFO compaction policy selects only files which have all cells expired. The column family MUST have non-default TTL. Essentially, FIFO compactor does only one job: collects expired store files. Because we do not do any real compaction, we do not use CPU and IO (disk and network), we do not evict hot data from a block cache. The result: improved throughput and latency both write and read. See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style

      Description

      FIFO Compaction

      Introduction

      FIFO compaction policy selects only files which have all cells expired. The column family MUST have non-default TTL.
      Essentially, FIFO compactor does only one job: collects expired store files. These are some applications which could benefit the most:

      1. Use it for very high volume raw data which has low TTL and which is the source of another data (after additional processing). Example: Raw time-series vs. time-based rollup aggregates and compacted time-series. We collect raw time-series and store them into CF with FIFO compaction policy, periodically we run task which creates rollup aggregates and compacts time-series, the original raw data can be discarded after that.
      2. Use it for data which can be kept entirely in a a block cache (RAM/SSD). Say we have local SSD (1TB) which we can use as a block cache. No need for compaction of a raw data at all.

      Because we do not do any real compaction, we do not use CPU and IO (disk and network), we do not evict hot data from a block cache. The result: improved throughput and latency both write and read.
      See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style

      To enable FIFO compaction policy

      For table:

      HTableDescriptor desc = new HTableDescriptor(tableName);
          desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
            FIFOCompactionPolicy.class.getName());
      

      For CF:

      HColumnDescriptor desc = new HColumnDescriptor(family);
          desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
            FIFOCompactionPolicy.class.getName());
      

      From HBase shell:

      create 'x',{NAME=>'y', TTL=>'30'}, {CONFIGURATION => {'hbase.hstore.defaultengine.compactionpolicy.class' => 'org.apache.hadoop.hbase.regionserver.compactions.FIFOCompactionPolicy', 'hbase.hstore.blockingStoreFiles' => 1000}}
      

      Although region splitting is supported, for optimal performance it should be disabled, either by setting explicitly DisabledRegionSplitPolicy or by setting ConstantSizeRegionSplitPolicy and very large max region size. You will have to increase to a very large number store's blocking file number : hbase.hstore.blockingStoreFiles as well (there is a sanity check on table/column family configuration in case of FIFO compaction and minimum value for number of blocking file is 1000).

      Limitations

      Do not use FIFO compaction if :

      • Table/CF has MIN_VERSION > 0
      • Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)

        Attachments

        1. HBASE-14468-v1.patch
          34 kB
          Vladimir Rodionov
        2. HBASE-14468-v2.patch
          36 kB
          Vladimir Rodionov
        3. HBASE-14468-v3.patch
          35 kB
          Vladimir Rodionov
        4. HBASE-14468-v4.patch
          47 kB
          Vladimir Rodionov
        5. HBASE-14468-v5.patch
          10 kB
          Vladimir Rodionov
        6. HBASE-14468-v6.patch
          12 kB
          Vladimir Rodionov
        7. HBASE-14468-v7.patch
          16 kB
          Vladimir Rodionov
        8. HBASE-14468-v8.patch
          23 kB
          Vladimir Rodionov
        9. HBASE-14468-v9.patch
          23 kB
          Vladimir Rodionov
        10. HBASE-14468-v10.patch
          23 kB
          Vladimir Rodionov
        11. 14468-0.98.txt
          21 kB
          Lars Hofhansl
        12. 14468-0.98-v2.txt
          21 kB
          Lars Hofhansl
        13. HBASE-14468.add.patch
          2 kB
          Vladimir Rodionov

          Issue Links

            Activity

              People

              • Assignee:
                vrodionov Vladimir Rodionov
                Reporter:
                vrodionov Vladimir Rodionov
              • Votes:
                0 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: