Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3745

Add the ability to restrict major-compactible files by timestamp

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.92.0
    • None
    • None
    • None

    Description

      In some applications, a common access pattern is to frequently scan tables with a time range predicate restricted to a fairly recent time window. For example, you may want to do an incremental aggregation or indexing step only on rows that have changed in the last hour. We do this efficiently by tracking min and max timestamp on an HFile level, so that old HFiles don't have to be read.

      After a major compaction, however, the entire dataset will need to be read, which can hurt performance of this access pattern.

      We should add a column family attribute that can specify a policy like: When major compacting, never include an HFile that contains data with a timestamp in the last 4 hours. This, recently flushed HFiles will always be uncompacted and provide the good scan performance required for these applications.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: