Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.92.0
-
None
-
None
-
None
Description
In some applications, a common access pattern is to frequently scan tables with a time range predicate restricted to a fairly recent time window. For example, you may want to do an incremental aggregation or indexing step only on rows that have changed in the last hour. We do this efficiently by tracking min and max timestamp on an HFile level, so that old HFiles don't have to be read.
After a major compaction, however, the entire dataset will need to be read, which can hurt performance of this access pattern.
We should add a column family attribute that can specify a policy like: When major compacting, never include an HFile that contains data with a timestamp in the last 4 hours. This, recently flushed HFiles will always be uncompacted and provide the good scan performance required for these applications.
Attachments
Issue Links
- is related to
-
HBASE-6428 Pluggable Compaction policies
- Closed
- relates to
-
HBASE-3842 Refactor Coprocessor Compaction API
- Closed