[HBASE-5479] Postpone CompactionSelection to compaction execution time - ASF JIRA

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Implemented
Affects Version/s: None
Fix Version/s: None
Component/s: io, Performance, regionserver
Labels:
None

Description

It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created. The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection. The CompactionSelection should be created at compaction execution time rather than compaction request time.

The current mechanism breaks down during high volume insertion. The inefficiency is clearest when the inserts are finished. Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue. When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old. This ends up re-compacting the same data many times.

The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap. With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters. The only goal should be to reduce file count, not to minimize number of files merged for each read.

There are other aspects to the current queuing mechanism that would need to be looked at. You would want to avoid having the same Store in the queue multiple times. And you would want the completion of one compaction to possibly queue another compaction request for the store.

A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc. Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes). The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

Attachments

Issue Links

is related to

HBASE-6361 Change the compaction queue to a round robin scheduler

Closed

HBASE-7672 Merging compaction requests in the queue for same store

Closed

relates to

HBASE-5334 Pluggable Compaction Algorithms

Closed

Activity

No work has yet been logged on this issue.

People

Assignee:: Unassigned

Reporter:: Matt Corgan

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 26/Feb/12 00:50

Updated:: 13/Jun/22 15:33

Resolved:: 09/Jul/13 22:04

HBase