Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-4823

long running scans lose benefit of bloomfilters and timerange hints

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.89-fb
    • None
    • None
    • Reviewed

    Description

      When you have a long running scan due to say an MR job, you can lose the benefit of timerange hints & bloom filters midway if your scanner gets reset. [Note: The scanners can get reset say due to a flush or compaction].

      In one of our workloads, we periodically want to do rollups on recent 15 minutes of data in a column family... but the timerange hint benefit is lost midway when this resetScannerStack (shown below) happens. And end result-- we end up reading all the old HFiles rather than just the recent HFiles.

       private void resetScannerStack(KeyValue lastTopKey) throws IOException {
          if (heap != null) {
            throw new RuntimeException("StoreScanner.reseek run on an existing heap!");
          }
      
          /* When we have the scan object, should we not pass it to getScanners()
           * to get a limited set of scanners? We did so in the constructor and we
           * could have done it now by storing the scan object from the constructor */
          List<KeyValueScanner> scanners = getScanners();
      

      The comment in the code seems to be aware of this issue and even has the suggested fix!

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--HBASE-4823.D519.1.patch
          10 kB
          Phabricator
        2. TestScannerResets-89fb.txt
          7 kB
          Kannan Muthukkaruppan

        Activity

          People

            amitanand Amitanand Aiyer
            kannanm Kannan Muthukkaruppan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: