Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6922 HBase scanner performance improvements
  3. HBASE-6066

some low hanging read path improvement ideas

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.95.0
    • Performance
    • None
    • Reviewed

    Description

      I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations...

      We seem to be doing several wasteful iterations over and/or building of temporary lists.

      1) One such is the following code in HRegionServer.next():

         boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
         if (!values.isEmpty()) {
           for (KeyValue kv : values) {              ------> #### wasteful in most cases
             currentScanResultSize += kv.heapSize();
         }
         results.add(new Result(values));
      

      By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
      we can avoid the unnecessary iteration to compute currentScanResultSize.

      2) An example of a wasteful temporary array, is "results" in
      RegionScanner.next().

            results.clear();
            boolean returnResult = nextInternal(limit, metric);
      
            outResults.addAll(results);
      

      results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults.

      3) Another almost similar exmaple of a wasteful array is "results" in StoreScanner.next(), which eventually also copies its results into "outResults".

      4) Reduce overhead of "size metric" maintained in StoreScanner.next().

        if (metric != null) {
           HRegion.incrNumericMetric(this.metricNamePrefix + metric,
                                     copyKv.getLength());
        }
        results.add(copyKv);
      

      A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV.

      5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods.

      Attachments

        1. metric-stringbuilder-fix.patch
          2 kB
          Todd Lipcon
        2. 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch
          11 kB
          Michael Stack
        3. 6066-rebased-1.patch
          8 kB
          Devaraj Das
        4. 6066-rebased-1.patch
          8 kB
          Ted Yu

        Activity

          People

            ddas Devaraj Das
            kannanm Kannan Muthukkaruppan
            Votes:
            0 Vote for this issue
            Watchers:
            24 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: