Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-9815

Add Histogram representative of row key distribution inside a region.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 0.89-fb
    • 0.89-fb
    • HFile
    • None

    Description

      Using histogram information, users can parallelize the scan workload into equal sized scans based on the estimated size from the Histogram information. This will help in enabling systems which are trying to perform queries on top of HBase to do cost based optimization while scanning. The Idea is to keep this histogram information in the HFile in the trailer and populate this on compaction and flush.

      The HRegionInterface can expose an API to return the Histogram information of a region, which can be generated by merging histograms of all the hfiles.

      Implementing the histogram on the basis of
      http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
      http://dl.acm.org/citation.cfm?id=1951376
      and NumericHistogram from hive.

      Attachments

        1. Histogram-9815.diff
          68 kB
          Manukranth Kolloju

        Activity

          People

            Unassigned Unassigned
            manukranthk Manukranth Kolloju
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: