Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.94.10
    • Fix Version/s: None
    • Component/s: Compaction
    • Tags:
      compaction, ttl

      Description

      TSCompactions

      The issue

      One of the biggest issue I currently deal with is compacting big
      stores, i.e. when HBase cluster is 80% full on 4 TB nodes (let say
      with a single big table), compactions might take several hours (from
      15 to 20 in my case).

      In 'time series' workloads, we could avoid compacting everything
      everytime. Think about OpenTSDB-like systems, or write-heavy,
      TTL based workloads where you want to free space everyday, deleting
      oldest data, and you're not concerned about read latency (i.e. read
      into a single bigger StoreFile).

      > Note: in this draft, I currently consider that we get free space from
      > the TTL behavior only, not really from the Delete operations.

      Proposal and benefits

      For such cases, StoreFiles could be organized and managed in a way
      that would compact:

      • recent StoreFiles with recent data
      • oldest StoreFiles that are concerned by TTL eviction

      By the way, it would help when scanning with a timestamp criterion.

      Configuration

      • hbase.hstore.compaction.sortByTS (boolean, default=false)
        This indicates if new behavior is enabled or not. Set it to
        false and compactions will remain the same than current ones.
      • hbase.hstore.compaction.ts.bucketSize (integer)
        If `sortByTS` is enabled, tells to HBase the target size of
        buckets. The lower, the more StoreFiles you'll get, but you should
        save more IO's. Higher values will generate less StoreFiles, but
        theses will be bigger and thus compactions could generate more
        IO's.

      Examples

      Here is how a common store could look like after some flushes and
      perhaps some minor compactions:

             ,---, ,---,       ,---,
             |   | |   | ,---, |   |
             |   | |   | |   | |   |
             `---' `---' `---' `---'
              SF1   SF2   SF3   SF4
      
             \__________ __________/
                        V
      
         for all of these Storefiles,
         let say minimum TS is 01/01/2013
             and maximum TS is 31/03/2013
      

      Set the bucket size to 1 month, and that's what we have after
      compaction:

                      ,---, ,---,
                      |   | |   |
                ,---, |   | |   |
                |   | |   | |   |
                `---' `---' `---'
                 SF1   SF2   SF3
      
             ,-----------------------------,
             |  minimum TS  |  maximum TS  |
       ,-----------------------------------'
       | SF1 |  03/03/2013  |  31/03/2013  | most recent, growing
       | SF2 |  31/01/2013  |  02/03/2013  | old data, "sealed"
       | SF3 |  01/01/2013  |  30/01/2013  | oldest data, "sealed"
       '-----------------------------------'
      

      StoreFile selection

      • for minor compactions, current algorithm should already do the
        right job. Pick up `n` eldest files that are small enough, and
        write a bigger file. Remember, TSCompaction are designed for time
        series, so this 'minor selection' should leave "sealed" big old
        files as they are.
      • for major compactions, when all the StoreFiles have been selected,
        apply the TTL first. StoreFiles that are entirely out of time just
        don't need to be rewritten. They'll be deleted in one time,
        avoiding lots of IO's.

      New issues and trade-offs

      1. In that case (bucketSize=1 month), after 1+ year, we'll have lots
      of StoreFiles (and more generally after `n * bucketSize` seconds) if
      there is no TTL eviction. In any case, a clever threshold should be
      implemented to limit the maximum number of StoreFiles.

      2. If we later add old data that matches timerange of a StoreFile
      which has already been compacted, this could generate lots of IO's
      to reconstruct a single StoreFile for this time bucket, perhaps just
      to merge a few lines.

        Issue Links

          Activity

          Hide
          cf357 Adrien Mogenet added a comment -

          As I wrote in the ticket, it's currently just a draft to get comments or advices
          I began writing some codes and doing some tests to see how it's relevant (or not).

          Show
          cf357 Adrien Mogenet added a comment - As I wrote in the ticket, it's currently just a draft to get comments or advices I began writing some codes and doing some tests to see how it's relevant (or not).
          Hide
          mcorgan Matt Corgan added a comment -

          Have you seen HBASE-7667? It can help timeseries data by only compacting the tail of a region, leaving older data untouched. The older "stripes" in a region are inherently "sealed" because your application stops writing to them, and they therefore don't need to be compacted.

          Show
          mcorgan Matt Corgan added a comment - Have you seen HBASE-7667 ? It can help timeseries data by only compacting the tail of a region, leaving older data untouched. The older "stripes" in a region are inherently "sealed" because your application stops writing to them, and they therefore don't need to be compacted.
          Hide
          ndimiduk Nick Dimiduk added a comment -

          Sergey Shelukhin might be interested in this ticket

          Show
          ndimiduk Nick Dimiduk added a comment - Sergey Shelukhin might be interested in this ticket
          Hide
          sershe Sergey Shelukhin added a comment -

          Sorry for kind of abandoning HBASE-7667. It needs a final push to commit. Let me try to squeeze that in this week. It will probably come in 0.96.1.
          The proposal as such sounds reasonable to me. Is TS part of the key, or actual HBase TS? If the latter, there is no guarantee that old data ends up in old files.

          Show
          sershe Sergey Shelukhin added a comment - Sorry for kind of abandoning HBASE-7667 . It needs a final push to commit. Let me try to squeeze that in this week. It will probably come in 0.96.1. The proposal as such sounds reasonable to me. Is TS part of the key, or actual HBase TS? If the latter, there is no guarantee that old data ends up in old files.
          Hide
          cf357 Adrien Mogenet added a comment -

          Yep, I've seen HBASE-7667, and this ticket has been hugely inspired by these stripe-compactions.

          Sergey, it would be based on HBase TS, and the idea here is exactly to guarantee that old files contain old data, thus you can ensure they will be efficiently evicted by TTL mechanism in a single IO (just deleting this old file).

          Show
          cf357 Adrien Mogenet added a comment - Yep, I've seen HBASE-7667 , and this ticket has been hugely inspired by these stripe-compactions. Sergey, it would be based on HBase TS, and the idea here is exactly to guarantee that old files contain old data, thus you can ensure they will be efficiently evicted by TTL mechanism in a single IO (just deleting this old file).
          Hide
          stack stack added a comment -

          Lars Hofhansl Didn't you used to talk about something like this?

          Show
          stack stack added a comment - Lars Hofhansl Didn't you used to talk about something like this?
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Yeah, have been discussing exactly the same thing. Our plans to enhance the coprocessor framework slightly so that this can be done via coprocessor hooks. Forgot the details - too much stuff going on - will find out tomorrow at work.

          Show
          lhofhansl Lars Hofhansl added a comment - Yeah, have been discussing exactly the same thing. Our plans to enhance the coprocessor framework slightly so that this can be done via coprocessor hooks. Forgot the details - too much stuff going on - will find out tomorrow at work.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Warming this up again. I called this time-tired compactions before. The work I mentioned above got abandoned.

          I think there is a lot of value in this. What I had in mind earlier to have in the tone "the last week separate from older stuff". That clearly does not work since "the last week" is a moving target. What is described here is much better, we just group data by a fixed timerange (like every month, every year, week, or every 10 days, or alike) that can work since it's not a moving target. We also do policies that group by week and eventually by month, etc, although then we'd need to force major compactions just to shift the files around. The value of data is decaying with age, and somehow we should capture that.

          Show
          lhofhansl Lars Hofhansl added a comment - Warming this up again. I called this time-tired compactions before. The work I mentioned above got abandoned. I think there is a lot of value in this. What I had in mind earlier to have in the tone "the last week separate from older stuff". That clearly does not work since "the last week" is a moving target. What is described here is much better, we just group data by a fixed timerange (like every month, every year, week, or every 10 days, or alike) that can work since it's not a moving target. We also do policies that group by week and eventually by month, etc, although then we'd need to force major compactions just to shift the files around. The value of data is decaying with age, and somehow we should capture that.
          Hide
          anoop.hbase Anoop Sam John added a comment -

          Is this similar to idea in HBASE-14477?

          Show
          anoop.hbase Anoop Sam John added a comment - Is this similar to idea in HBASE-14477 ?
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Sounds very similar to me.

          Show
          lhofhansl Lars Hofhansl added a comment - Sounds very similar to me.
          Hide
          cf357 Adrien Mogenet added a comment -

          Think so!

          Show
          cf357 Adrien Mogenet added a comment - Think so!

            People

            • Assignee:
              Unassigned
              Reporter:
              cf357 Adrien Mogenet
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:

                Development