Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21332

HBase scan with PageFilter cannot get all rows, non-edge region skiped

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 1.1.2
    • Fix Version/s: None
    • Component/s: regionserver, Scanners
    • Labels:
      None
    • Environment:
      • Server version:1.1.2.2.6.5.0-292, revision=897822d4dd5956ca186974c10382e9094683fa29
      • 2 region servers
      • 4 regions
      • HBase client:1.3.1

       

      Description

      When using scan with pagefilter to get data from hbase, the scanner will skip 'non-edge' regions.The code i use comes from the book HBase: Definitive Guide, Example 4.8, PageFilter example. Difference is i use scan with startRow and stopRow.

      Say i have regions with start and end keys like {'111', '222', '333', '444'}, which means i have 3 regions {111, 222}, {222, 333}, {333, 444} and they are in different region servers. When scan with startRow '111' and stopRow '444' , most data in region {222, 333} will be skiped and won't be returned by ResultScanner.Region {111,222} or {333,444} works just fine and because region {222,333} doesn't contain startRowkey or stopRowkey i call it non-edge region.

      Below is some explanation with log:

       

      // Here scanner works just fine in region {111,222}, it gets exactly {pageSize} rows each time, which is 1000
      ...
      2018-10-17 21:25:57.810 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results from [2139718600001069] to [2179067497952422], sum [1000 : 64000], cost: [77ms]
      2018-10-17 21:25:57.885 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results from [2179098921079755] to [21c2879280113661], sum [1000 : 65000], cost: [75ms]
      2018-10-17 21:25:57.962 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results from [21c2899018774688] to [2203180876471552], sum [1000 : 66000], cost: [77ms]
      
      // Here scanner goes from region {111,222} to {222,333}. As you can see, the scanner gets 2405 rows with stopRow '3373621463365126'.The scanner moves to regin {333,444} too early and most data in {222,333} are skiped.
      2018-10-17 21:25:58.321 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results from [2203223414254308] to [3373621463365126], sum [2405 : 68405], cost: [359ms]
      
      // Now the scanner is in region {333,444}, everything works just fine
      2018-10-17 21:25:58.396 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results from [3373764408525604] to [33b3849714659525], sum [1000 : 69405], cost: [74ms]
      2018-10-17 21:25:58.467 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results from [33b3882378177107] to [33f5221377695765], sum [1000 : 70405], cost: [71ms]
      ...

       

        Attachments

        1. image-2018-10-23-17-37-22-028.png
          13 kB
          pddNick
        2. HBaseTest.java
          5 kB
          pddNick
        3. image-2018-10-17-21-15-23-439.png
          16 kB
          pddNick
        4. image-2018-10-17-21-14-25-354.png
          6 kB
          pddNick

          Activity

            People

            • Assignee:
              openinx Zheng Hu
              Reporter:
              yadance pddNick
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: