HBase
  1. HBase
  2. HBASE-3583

Coprocessors: RegionObserver: ScannerNext and ScannerClose hooks are called when get() is invoked

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.92.0
    • Component/s: Coprocessors
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      RegionObserver upcalls are expected to be triggered by corresponding client calls.

      I found that if a HTable.get() is issued, ScannerNext, and ScannerClose hooks are also invoked.

      Here is the reason: HRegion.get() is implemented with an internal scanner:

          InternalScanner scanner = null;
          try {
            scanner = getScanner(scan);
            scanner.next(results);
          } finally {
            if (scanner != null)
              scanner.close();
          }
      

      where scanner.next, and scanner.close() are implemented with RegionObserver hooks.

      1. HBase3583.patch
        25 kB
        Mingjie Lai
      2. HBase3583-3.patch
        26 kB
        Mingjie Lai

        Activity

        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #1807 (See https://hudson.apache.org/hudson/job/HBase-TRUNK/1807/)
        HBASE-3583 Coprocessors: scannerNext and scannerClose hooks are called when HRegionInterface#get is invoked

        Show
        Hudson added a comment - Integrated in HBase-TRUNK #1807 (See https://hudson.apache.org/hudson/job/HBase-TRUNK/1807/ ) HBASE-3583 Coprocessors: scannerNext and scannerClose hooks are called when HRegionInterface#get is invoked
        Hide
        Andrew Purtell added a comment -

        Committed to trunk.

        Show
        Andrew Purtell added a comment - Committed to trunk.
        Hide
        Mingjie Lai added a comment -

        The 3rd patch to rb. Also cc here.

        Show
        Mingjie Lai added a comment - The 3rd patch to rb. Also cc here.
        Hide
        stack added a comment -

        Mingie: I commented over on rb. I'm w/ Gary thinking #3 could work. Just keep it as InternalScanner everywhere, in particular as return out of getScanner but then internally doc. why its so ugly. Do the instanceof test and if not as expected, throw ugly exception. I think this least intrusive route and you CYA with comments on why it is the way it is and that should be changed when we have some elbow room for API changing (though, thinking on it, we have elbow room? This comes out in 0.92 – don't we break interfaces in 0.92? I haven't checked)

        Show
        stack added a comment - Mingie: I commented over on rb. I'm w/ Gary thinking #3 could work. Just keep it as InternalScanner everywhere, in particular as return out of getScanner but then internally doc. why its so ugly. Do the instanceof test and if not as expected, throw ugly exception. I think this least intrusive route and you CYA with comments on why it is the way it is and that should be changed when we have some elbow room for API changing (though, thinking on it, we have elbow room? This comes out in 0.92 – don't we break interfaces in 0.92? I haven't checked)
        Hide
        Ted Yu added a comment -

        Can we consider a simpler change - directly cast scanner to HRegion.RegionScanner ?
        In the future, if the dependency is broken (unlikely), we can change the cast to 'instanceof' check.

        Show
        Ted Yu added a comment - Can we consider a simpler change - directly cast scanner to HRegion.RegionScanner ? In the future, if the dependency is broken (unlikely), we can change the cast to 'instanceof' check.
        Hide
        Ted Yu added a comment -

        In HRegion, I didn't find caller of the following method which passes non-null additionalScanners:

          protected InternalScanner getScanner(Scan scan, List<KeyValueScanner> additionalScanners) throws IOException {
        
        Show
        Ted Yu added a comment - In HRegion, I didn't find caller of the following method which passes non-null additionalScanners: protected InternalScanner getScanner(Scan scan, List<KeyValueScanner> additionalScanners) throws IOException {
        Hide
        Mingjie Lai added a comment -

        For option 2, the impact would be wide, since client also uses the same scanner id.

        Option 3 is not feasible since changing the Map type would require changing the return type for HRegion.getScanner(). Sorry about it.

        Show
        Mingjie Lai added a comment - For option 2, the impact would be wide, since client also uses the same scanner id. Option 3 is not feasible since changing the Map type would require changing the return type for HRegion.getScanner(). Sorry about it.
        Hide
        Mingjie Lai added a comment -

        Stack. Need your feedback for your first comments:
        https://review.cloudera.org/r/1617/diff/1/?file=26436#file26436line1812

        >>>>

        My original purpose of this addition map is to have a way to obtain region name from scanner id, then get region instance, in order to call corresponding region coprocessors:

        Result[] next(final long scannerId, int nbRows)

        { // need to get to region to access coprocessors ... }

        There are several options can do it:

        1) add a map along with original scanners map (as the 1st patch)
        2) as you suggested, to change the meaning/data type of scanner id, but it will impact other components, ie. client lib, etc.
        3) instead of Map<InternalScanner>, just use Map<HRegion.RegionScanner> as scanners. There are a lot of ugly thing like:

        if (scanner instanceof HRegion.RegionScanner)

        { ... }

        so why not just using HRegion.RegionScanner directly.

        4) ugly way: don't add anything, just use existing suff

        if (scanner instanceof HRegion.RegionScanner)

        { ... HRegionInfo regionName = rs.getRegionName(); ... }

        <<<<

        What do you think?

        Show
        Mingjie Lai added a comment - Stack. Need your feedback for your first comments: https://review.cloudera.org/r/1617/diff/1/?file=26436#file26436line1812 >>>> My original purpose of this addition map is to have a way to obtain region name from scanner id, then get region instance, in order to call corresponding region coprocessors: Result[] next(final long scannerId, int nbRows) { // need to get to region to access coprocessors ... } There are several options can do it: 1) add a map along with original scanners map (as the 1st patch) 2) as you suggested, to change the meaning/data type of scanner id, but it will impact other components, ie. client lib, etc. 3) instead of Map<InternalScanner>, just use Map<HRegion.RegionScanner> as scanners. There are a lot of ugly thing like: if (scanner instanceof HRegion.RegionScanner) { ... } so why not just using HRegion.RegionScanner directly. 4) ugly way: don't add anything, just use existing suff if (scanner instanceof HRegion.RegionScanner) { ... HRegionInfo regionName = rs.getRegionName(); ... } <<<< What do you think?
        Hide
        stack added a comment -

        I added a few comments over on rb Mingjie. Good stuff.

        Show
        stack added a comment - I added a few comments over on rb Mingjie. Good stuff.
        Hide
        Mingjie Lai added a comment -

        Patch.

        Show
        Mingjie Lai added a comment - Patch.
        Hide
        Mingjie Lai added a comment -

        After reviewed the current implementation, I think it makes more sense to pull the scannerNext and scannerClose hooks from HRegion to HRegionServer, instead of using a flag to indicate whether to invoke coprocessor hooks or not. Reason 1) the internalScanner are called at several different places, there are potential misfires as this case; 2) it would be ugly to pass a boolean flag to the internalscanner(it's an interface).

        Patch posted here and reviewboard.
        https://review.cloudera.org/r/1617/

        Show
        Mingjie Lai added a comment - After reviewed the current implementation, I think it makes more sense to pull the scannerNext and scannerClose hooks from HRegion to HRegionServer, instead of using a flag to indicate whether to invoke coprocessor hooks or not. Reason 1) the internalScanner are called at several different places, there are potential misfires as this case; 2) it would be ugly to pass a boolean flag to the internalscanner(it's an interface). Patch posted here and reviewboard. https://review.cloudera.org/r/1617/
        Hide
        Andrew Purtell added a comment -

        We discussed this internally. Reasonable options are:

        1) Rework the Coprocessor RegionObserver API so "everything is a scan"

        2) Pass booleans to internal functions to distinguish between the code paths

        We are opting for #2. A feature of the RegionObserver API is it mirrors HRegionInterface so the user/implementor has precise options for overriding or augmenting client actions.

        Show
        Andrew Purtell added a comment - We discussed this internally. Reasonable options are: 1) Rework the Coprocessor RegionObserver API so "everything is a scan" 2) Pass booleans to internal functions to distinguish between the code paths We are opting for #2. A feature of the RegionObserver API is it mirrors HRegionInterface so the user/implementor has precise options for overriding or augmenting client actions.

          People

          • Assignee:
            Mingjie Lai
            Reporter:
            Mingjie Lai
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development