Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14477

Compaction improvements: Date tiered compaction policy



    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: None
    • Labels:


      For immutable and mostly immutable data the current SizeTiered-based compaction policy is not efficient.

      1. There is no need to compact all files into one, because, data is (mostly) immutable and we do not need to collect garbage. (performance reason will be discussed later)
      2. Size-tiered compaction is not suitable for applications where most recent data is most important and prevents efficient caching of this data.

      The idea is pretty similar to DateTieredCompaction in Cassandra:


      From Cassandra own blog:

      Since DTCS can be used with any table, it is important to know when it is a good idea, and when it is not. I’ll try to explain the spectrum and trade-offs here:

      1. Perfect Fit: Time Series Fact Data, Deletes by Default TTL: When you ingest fact data that is ordered in time, with no deletes or overwrites. This is the standard “time series” use case.

      2. OK Fit: Time-Ordered, with limited updates across whole data set, or only updates to recent data: When you ingest data that is (mostly) ordered in time, but revise or delete a very small proportion of the overall data across the whole timeline.

      3. Not a Good Fit: many partial row updates or deletions over time: When you need to partially revise or delete fields for rows that you read together. Also, when you revise or delete rows within clustered reads.


          Issue Links



              • Assignee:
                vrodionov Vladimir Rodionov
                vrodionov Vladimir Rodionov
              • Votes:
                0 Vote for this issue
                16 Start watching this issue


                • Created: