[HBASE-3242] HLog Compactions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Not A Problem
Affects Version/s: None
Fix Version/s: None
Component/s: regionserver
Labels:
None

Description

Currently, our memstore flush algorithm is pretty trivial. We let it grow to a flushsize and flush a region or grow to a certain log count and then flush everything below a seqid. In certain situations, we can get big wins from being more intelligent with our memstore flush algorithm. I suggest we look into algorithms to intelligently handle HLog compactions. By compaction, I mean replacing existing HLogs with new HLogs created using the contents of a memstore snapshot. Situations where we can get huge wins:

1. In the incrementColumnValue case, N HLog entries often correspond to a single memstore entry. Although we may have large HLog files, our memstore could be relatively small.
2. If we have a hot region, the majority of the HLog consists of that one region and other region edits would be minuscule.

In both cases, we are forced to flush a bunch of very small stores. Its really hard for a compaction algorithm to be efficient when it has no guarantees of the approximate size of a new StoreFile, so it currently does unconditional, inefficient compactions. Additionally, compactions & flushes suck because they invalidate cache entries: be it memstore or LRUcache. If we can limit flushes to cases where we will have significant HFile output on a per-Store basis, we can get improved performance, stability, and reduced failover time.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Nicolas Spiegelberg

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 17/Nov/10 01:18

Updated:: 12/Jun/22 00:36

Resolved:: 19/Jul/14 00:33