HBase
  1. HBase
  2. HBASE-2885

Data which was invisible shows up after restarting HBase

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.20.5
    • Fix Version/s: None
    • Component/s: master
    • Labels:
      None

      Description

      As experienced by Karthik Kambatla, Vlad and Steve Kuo, HBase 0.20.5 exhibits inconsistent behavior when user tries to access data in a table.

      One such case involves offline region for the underlying table.

      See the following threads in hbase user mailing list:
      How to delete an "non-existent" table
      Flaky tableExists()

      And this thread in hbase dev mailing list:
      Data disappears and re-appears again after HBase cluster restart

        Activity

        Hide
        Ted Yu added a comment -

        I can observe this problem on master.jsp which shows 6 tables after 2 runs.
        But HBase shell shows correct number of tables: 14

        It is clear from master log that although RegionManager.metaScanner locates .META. table on 10.32.56.155, the cache in HConnectionManager$TableServers is stale.
        Refreshing master.jsp doesn't give the correct table count.

        Here is snippet from master log:

        2010-07-28 15:19:10,974 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=60, maxHeap=3991) 1 regions
        2010-07-28 15:19:10,974 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 to sjc9-flash-grid02.carrieriq.com,60020,1280346234926
        2010-07-28 15:19:10,976 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 from sjc9-flash-grid02.carrieriq.com,60020,1280346234926; 1 of 1
        2010-07-28 15:19:10,976 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: PendingOpenOperation from sjc9-flash-grid02.carrieriq.com,60020,1280346234926
        2010-07-28 15:19:10,976 INFO org.apache.hadoop.hbase.master.RegionServerOperation: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 open on 10.32.56.156:60020
        2010-07-28 15:19:10,977 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 in region .META.,,1 with startcode=1280346234926, server=10.32.56.156:60020
        2010-07-28 15:20:08,621 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for row <> in tableName .META.: location server 10.32.56.157:60020, location region name .META.,,1
        2010-07-28 15:20:09,875 INFO org.apache.hadoop.hbase.master.ServerManager: 3 region servers, 0 dead, average load 14.333333333333334
        2010-07-28 15:20:10,001 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region

        {server: 10.32.56.155:60020, regionname: .META.,,1, startKey: <>}

        2010-07-28 15:20:10,016 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 is not valid; serverAddress=10.32.56.156:60020, startCode=1280346234926 unknown.
        2010-07-28 15:20:10,021 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 14 row(s) of meta region

        {server: 10.32.56.155:60020, regionname: .META.,,1, startKey: <>}

        complete
        2010-07-28 15:20:10,022 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned
        2010-07-28 15:20:10,162 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=45, maxHeap=3991): total nregions to assign=1, regions to give other servers than this=0, isMetaAssign=false
        2010-07-28 15:20:10,162 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=45, maxHeap=3991) 1 regions
        2010-07-28 15:20:10,162 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 to sjc9-flash-grid02.carrieriq.com,60020,1280346234926
        2010-07-28 15:20:10,164 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 from sjc9-flash-grid02.carrieriq.com,60020,1280346234926; 1 of 1
        2010-07-28 15:20:10,164 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: PendingOpenOperation from sjc9-flash-grid02.carrieriq.com,60020,1280346234926
        2010-07-28 15:20:10,164 INFO org.apache.hadoop.hbase.master.RegionServerOperation: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 open on 10.32.56.156:60020
        2010-07-28 15:20:10,166 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 in region .META.,,1 with startcode=1280346234926, server=10.32.56.156:60020
        2010-07-28 15:20:10,810 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region

        {server: 10.32.56.155:60020, regionname: -ROOT-,,0, startKey: <>}

        2010-07-28 15:20:10,813 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region

        {server: 10.32.56.155:60020, regionname: -ROOT-,,0, startKey: <>}

        complete

        Show
        Ted Yu added a comment - I can observe this problem on master.jsp which shows 6 tables after 2 runs. But HBase shell shows correct number of tables: 14 It is clear from master log that although RegionManager.metaScanner locates .META. table on 10.32.56.155, the cache in HConnectionManager$TableServers is stale. Refreshing master.jsp doesn't give the correct table count. Here is snippet from master log: 2010-07-28 15:19:10,974 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=60, maxHeap=3991) 1 regions 2010-07-28 15:19:10,974 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 to sjc9-flash-grid02.carrieriq.com,60020,1280346234926 2010-07-28 15:19:10,976 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 from sjc9-flash-grid02.carrieriq.com,60020,1280346234926; 1 of 1 2010-07-28 15:19:10,976 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: PendingOpenOperation from sjc9-flash-grid02.carrieriq.com,60020,1280346234926 2010-07-28 15:19:10,976 INFO org.apache.hadoop.hbase.master.RegionServerOperation: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 open on 10.32.56.156:60020 2010-07-28 15:19:10,977 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 in region .META.,,1 with startcode=1280346234926, server=10.32.56.156:60020 2010-07-28 15:20:08,621 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for row <> in tableName .META.: location server 10.32.56.157:60020, location region name .META.,,1 2010-07-28 15:20:09,875 INFO org.apache.hadoop.hbase.master.ServerManager: 3 region servers, 0 dead, average load 14.333333333333334 2010-07-28 15:20:10,001 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.32.56.155:60020, regionname: .META.,,1, startKey: <>} 2010-07-28 15:20:10,016 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 is not valid; serverAddress=10.32.56.156:60020, startCode=1280346234926 unknown. 2010-07-28 15:20:10,021 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 14 row(s) of meta region {server: 10.32.56.155:60020, regionname: .META.,,1, startKey: <>} complete 2010-07-28 15:20:10,022 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned 2010-07-28 15:20:10,162 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=45, maxHeap=3991): total nregions to assign=1, regions to give other servers than this=0, isMetaAssign=false 2010-07-28 15:20:10,162 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=45, maxHeap=3991) 1 regions 2010-07-28 15:20:10,162 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 to sjc9-flash-grid02.carrieriq.com,60020,1280346234926 2010-07-28 15:20:10,164 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 from sjc9-flash-grid02.carrieriq.com,60020,1280346234926; 1 of 1 2010-07-28 15:20:10,164 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: PendingOpenOperation from sjc9-flash-grid02.carrieriq.com,60020,1280346234926 2010-07-28 15:20:10,164 INFO org.apache.hadoop.hbase.master.RegionServerOperation: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 open on 10.32.56.156:60020 2010-07-28 15:20:10,166 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 in region .META.,,1 with startcode=1280346234926, server=10.32.56.156:60020 2010-07-28 15:20:10,810 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.32.56.155:60020, regionname: -ROOT-,,0, startKey: <>} 2010-07-28 15:20:10,813 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {server: 10.32.56.155:60020, regionname: -ROOT-,,0, startKey: <>} complete
        Hide
        Ted Yu added a comment -

        clearRegionCache() which is in trunk should be added to HConnectionManager.java

        Show
        Ted Yu added a comment - clearRegionCache() which is in trunk should be added to HConnectionManager.java
        Hide
        Andrew Purtell added a comment -

        Reopen or file new issue if still relevant with modern HBase versions

        Show
        Andrew Purtell added a comment - Reopen or file new issue if still relevant with modern HBase versions

          People

          • Assignee:
            Unassigned
            Reporter:
            Ted Yu
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development