TSCompactions

The issue

One of the biggest issue I currently deal with is compacting big
stores, i.e. when HBase cluster is 80% full on 4 TB nodes (let say
with a single big table), compactions might take several hours (from
15 to 20 in my case).

In 'time series' workloads, we could avoid compacting everything
everytime. Think about OpenTSDB-like systems, or write-heavy,
TTL based workloads where you want to free space everyday, deleting
oldest data, and you're not concerned about read latency (i.e. read
into a single bigger StoreFile).

> Note: in this draft, I currently consider that we get free space from
> the TTL behavior only, not really from the Delete operations.

Proposal and benefits

For such cases, StoreFiles could be organized and managed in a way
that would compact:

recent StoreFiles with recent data
oldest StoreFiles that are concerned by TTL eviction

By the way, it would help when scanning with a timestamp criterion.

Configuration

hbase.hstore.compaction.sortByTS (boolean, default=false)
This indicates if new behavior is enabled or not. Set it to
false and compactions will remain the same than current ones.

hbase.hstore.compaction.ts.bucketSize (integer)
If `sortByTS` is enabled, tells to HBase the target size of
buckets. The lower, the more StoreFiles you'll get, but you should
save more IO's. Higher values will generate less StoreFiles, but
theses will be bigger and thus compactions could generate more
IO's.

Examples

Here is how a common store could look like after some flushes and
perhaps some minor compactions:

       ,---, ,---,       ,---,
       |   | |   | ,---, |   |
       |   | |   | |   | |   |
       `---' `---' `---' `---'
        SF1   SF2   SF3   SF4

       \__________ __________/
                  V

   for all of these Storefiles,
   let say minimum TS is 01/01/2013
       and maximum TS is 31/03/2013

Set the bucket size to 1 month, and that's what we have after
compaction:

                ,---, ,---,
                |   | |   |
          ,---, |   | |   |
          |   | |   | |   |
          `---' `---' `---'
           SF1   SF2   SF3

       ,-----------------------------,
       |  minimum TS  |  maximum TS  |
 ,-----------------------------------'
 | SF1 |  03/03/2013  |  31/03/2013  | most recent, growing
 | SF2 |  31/01/2013  |  02/03/2013  | old data, "sealed"
 | SF3 |  01/01/2013  |  30/01/2013  | oldest data, "sealed"
 '-----------------------------------'

StoreFile selection

for minor compactions, current algorithm should already do the
right job. Pick up `n` eldest files that are small enough, and
write a bigger file. Remember, TSCompaction are designed for time
series, so this 'minor selection' should leave "sealed" big old
files as they are.

for major compactions, when all the StoreFiles have been selected,
apply the TTL first. StoreFiles that are entirely out of time just
don't need to be rewritten. They'll be deleted in one time,
avoiding lots of IO's.

New issues and trade-offs

1. In that case (bucketSize=1 month), after 1+ year, we'll have lots
of StoreFiles (and more generally after `n * bucketSize` seconds) if
there is no TTL eviction. In any case, a clever threshold should be
implemented to limit the maximum number of StoreFiles.

2. If we later add old data that matches timerange of a StoreFile
which has already been compacted, this could generate lots of IO's
to reconstruct a single StoreFile for this time bucket, perhaps just
to merge a few lines.

Attachments

Issue Links

is related to

HBASE-14477 Compaction improvements: Date tiered compaction policy

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Adrien Mogenet

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 18/Aug/13 12:40

Updated:: 16/Jun/22 17:56

Resolved:: 16/Jun/22 17:56