Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
For immutable and mostly immutable data the current SizeTiered-based compaction policy is not efficient.
- There is no need to compact all files into one, because, data is (mostly) immutable and we do not need to collect garbage. (performance reason will be discussed later)
- Size-tiered compaction is not suitable for applications where most recent data is most important and prevents efficient caching of this data.
The idea is pretty similar to DateTieredCompaction in Cassandra:
http://www.datastax.com/dev/blog/datetieredcompactionstrategy
http://www.datastax.com/dev/blog/dtcs-notes-from-the-field
From Cassandra own blog:
Since DTCS can be used with any table, it is important to know when it is a good idea, and when it is not. I’ll try to explain the spectrum and trade-offs here:
1. Perfect Fit: Time Series Fact Data, Deletes by Default TTL: When you ingest fact data that is ordered in time, with no deletes or overwrites. This is the standard “time series” use case.
2. OK Fit: Time-Ordered, with limited updates across whole data set, or only updates to recent data: When you ingest data that is (mostly) ordered in time, but revise or delete a very small proportion of the overall data across the whole timeline.
3. Not a Good Fit: many partial row updates or deletions over time: When you need to partially revise or delete fields for rows that you read together. Also, when you revise or delete rows within clustered reads.
Attachments
Issue Links
- duplicates
-
HBASE-15181 A simple implementation of date based tiered compaction
- Closed
- is part of
-
HBASE-14383 Compaction improvements
- Closed
- relates to
-
HBASE-9260 Timestamp Compactions
- Closed
-
HBASE-7055 port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
- Closed
- requires
-
HBASE-14511 StoreFile.Writer Meta Plugin
- Closed
1.
|
Implement StoreFile Writer Plugin (HBASE-14511) for Gen Compaction Policy | Closed | Unassigned |