Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14477

Compaction improvements: Date tiered compaction policy

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      For immutable and mostly immutable data the current SizeTiered-based compaction policy is not efficient.

      1. There is no need to compact all files into one, because, data is (mostly) immutable and we do not need to collect garbage. (performance reason will be discussed later)
      2. Size-tiered compaction is not suitable for applications where most recent data is most important and prevents efficient caching of this data.

      The idea is pretty similar to DateTieredCompaction in Cassandra:

      http://www.datastax.com/dev/blog/datetieredcompactionstrategy
      http://www.datastax.com/dev/blog/dtcs-notes-from-the-field

      From Cassandra own blog:

      Since DTCS can be used with any table, it is important to know when it is a good idea, and when it is not. I’ll try to explain the spectrum and trade-offs here:

      1. Perfect Fit: Time Series Fact Data, Deletes by Default TTL: When you ingest fact data that is ordered in time, with no deletes or overwrites. This is the standard “time series” use case.

      2. OK Fit: Time-Ordered, with limited updates across whole data set, or only updates to recent data: When you ingest data that is (mostly) ordered in time, but revise or delete a very small proportion of the overall data across the whole timeline.

      3. Not a Good Fit: many partial row updates or deletions over time: When you need to partially revise or delete fields for rows that you read together. Also, when you revise or delete rows within clustered reads.

        Issue Links

          Activity

          Hide
          vrodionov Vladimir Rodionov added a comment -

          After internal discussion with peers we have agreed that users can be confused in configuring Generational compaction and similar DateTieredCompaction is better alternative. So, renamed the JIRA.

          Show
          vrodionov Vladimir Rodionov added a comment - After internal discussion with peers we have agreed that users can be confused in configuring Generational compaction and similar DateTieredCompaction is better alternative. So, renamed the JIRA.
          Hide
          anoop.hbase Anoop Sam John added a comment -

          HBASE-9260 similar thought.. May be one we can close as dup.. This will be a good feature. Thanks V!

          Show
          anoop.hbase Anoop Sam John added a comment - HBASE-9260 similar thought.. May be one we can close as dup.. This will be a good feature. Thanks V!
          Hide
          sershe Sergey Shelukhin added a comment -

          Stripe compactions have a variant that is good for time-sorted data where only the recent stripe is compacted; however, it still stripes by key, not by timestamp. Does it solve this use case?

          Show
          sershe Sergey Shelukhin added a comment - Stripe compactions have a variant that is good for time-sorted data where only the recent stripe is compacted; however, it still stripes by key, not by timestamp. Does it solve this use case?
          Hide
          davelatham Dave Latham added a comment -

          Vladimir, this looks great - would love to be able to have it. Do you have intent to backport to 1 or 0.98?

          Show
          davelatham Dave Latham added a comment - Vladimir, this looks great - would love to be able to have it. Do you have intent to backport to 1 or 0.98?
          Hide
          vrodionov Vladimir Rodionov added a comment -

          Dave Latham

          Do you have intent to backport to 1 or 0.98?

          Short answer is - yes.

          Show
          vrodionov Vladimir Rodionov added a comment - Dave Latham Do you have intent to backport to 1 or 0.98? Short answer is - yes.
          Hide
          anoop.hbase Anoop Sam John added a comment -

          Are you working on this now V?

          Show
          anoop.hbase Anoop Sam John added a comment - Are you working on this now V?
          Hide
          vrodionov Vladimir Rodionov added a comment -

          Its #2 in my pipeline (#1 is HBASE-10390). Be patient Anoop Sam John

          Show
          vrodionov Vladimir Rodionov added a comment - Its #2 in my pipeline (#1 is HBASE-10390 ). Be patient Anoop Sam John
          Hide
          anoop.hbase Anoop Sam John added a comment -

          We wanted to work on a project after this policy is in.. That is why pinged Also can give you a helping hand if u want/like..

          Show
          anoop.hbase Anoop Sam John added a comment - We wanted to work on a project after this policy is in.. That is why pinged Also can give you a helping hand if u want/like..
          Hide
          enis Enis Soztutar added a comment -

          Should we close this as a duplicate of HBASE-15181?

          Show
          enis Enis Soztutar added a comment - Should we close this as a duplicate of HBASE-15181 ?
          Hide
          vrodionov Vladimir Rodionov added a comment - - edited

          Sure, Enis Soztutar. Duplicate of HBASE-15181

          Show
          vrodionov Vladimir Rodionov added a comment - - edited Sure, Enis Soztutar . Duplicate of HBASE-15181

            People

            • Assignee:
              vrodionov Vladimir Rodionov
              Reporter:
              vrodionov Vladimir Rodionov
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development