HBase
  1. HBase
  2. HBASE-2885

Data which was invisible shows up after restarting HBase

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.20.5
    • Fix Version/s: None
    • Component/s: master
    • Labels:
      None

      Description

      As experienced by Karthik Kambatla, Vlad and Steve Kuo, HBase 0.20.5 exhibits inconsistent behavior when user tries to access data in a table.

      One such case involves offline region for the underlying table.

      See the following threads in hbase user mailing list:
      How to delete an "non-existent" table
      Flaky tableExists()

      And this thread in hbase dev mailing list:
      Data disappears and re-appears again after HBase cluster restart

        Activity

        Ted Yu created issue -
        Ted Yu made changes -
        Field Original Value New Value
        Summary inconsistent behavior in HBase shell commands [list, create] leads to TableExistsException Data which was invisible shows up after restarting HBase
        Description As experienced by Karthik Kambatla, Vlad and Steve Kuo, HBase 0.20.5 exhibits inconsistent behavior in HBase shell commands [list, create] that leads to TableExistsException.

        One such case involves offline region for the underlying table.

        See the following threads in hbase user mailing list:
        How to delete an "non-existent" table
        Flaky tableExists()

        And this thread in hbase dev mailing list:
        Data disappears and re-appears again after HBase cluster restart
        As experienced by Karthik Kambatla, Vlad and Steve Kuo, HBase 0.20.5 exhibits inconsistent behavior when user tries to access data in a table.

        One such case involves offline region for the underlying table.

        See the following threads in hbase user mailing list:
        How to delete an "non-existent" table
        Flaky tableExists()

        And this thread in hbase dev mailing list:
        Data disappears and re-appears again after HBase cluster restart
        Hide
        Ted Yu added a comment -

        I can observe this problem on master.jsp which shows 6 tables after 2 runs.
        But HBase shell shows correct number of tables: 14

        It is clear from master log that although RegionManager.metaScanner locates .META. table on 10.32.56.155, the cache in HConnectionManager$TableServers is stale.
        Refreshing master.jsp doesn't give the correct table count.

        Here is snippet from master log:

        2010-07-28 15:19:10,974 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=60, maxHeap=3991) 1 regions
        2010-07-28 15:19:10,974 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 to sjc9-flash-grid02.carrieriq.com,60020,1280346234926
        2010-07-28 15:19:10,976 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 from sjc9-flash-grid02.carrieriq.com,60020,1280346234926; 1 of 1
        2010-07-28 15:19:10,976 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: PendingOpenOperation from sjc9-flash-grid02.carrieriq.com,60020,1280346234926
        2010-07-28 15:19:10,976 INFO org.apache.hadoop.hbase.master.RegionServerOperation: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 open on 10.32.56.156:60020
        2010-07-28 15:19:10,977 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 in region .META.,,1 with startcode=1280346234926, server=10.32.56.156:60020
        2010-07-28 15:20:08,621 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for row <> in tableName .META.: location server 10.32.56.157:60020, location region name .META.,,1
        2010-07-28 15:20:09,875 INFO org.apache.hadoop.hbase.master.ServerManager: 3 region servers, 0 dead, average load 14.333333333333334
        2010-07-28 15:20:10,001 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region

        {server: 10.32.56.155:60020, regionname: .META.,,1, startKey: <>}

        2010-07-28 15:20:10,016 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 is not valid; serverAddress=10.32.56.156:60020, startCode=1280346234926 unknown.
        2010-07-28 15:20:10,021 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 14 row(s) of meta region

        {server: 10.32.56.155:60020, regionname: .META.,,1, startKey: <>}

        complete
        2010-07-28 15:20:10,022 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned
        2010-07-28 15:20:10,162 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=45, maxHeap=3991): total nregions to assign=1, regions to give other servers than this=0, isMetaAssign=false
        2010-07-28 15:20:10,162 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=45, maxHeap=3991) 1 regions
        2010-07-28 15:20:10,162 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 to sjc9-flash-grid02.carrieriq.com,60020,1280346234926
        2010-07-28 15:20:10,164 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 from sjc9-flash-grid02.carrieriq.com,60020,1280346234926; 1 of 1
        2010-07-28 15:20:10,164 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: PendingOpenOperation from sjc9-flash-grid02.carrieriq.com,60020,1280346234926
        2010-07-28 15:20:10,164 INFO org.apache.hadoop.hbase.master.RegionServerOperation: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 open on 10.32.56.156:60020
        2010-07-28 15:20:10,166 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 in region .META.,,1 with startcode=1280346234926, server=10.32.56.156:60020
        2010-07-28 15:20:10,810 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region

        {server: 10.32.56.155:60020, regionname: -ROOT-,,0, startKey: <>}

        2010-07-28 15:20:10,813 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region

        {server: 10.32.56.155:60020, regionname: -ROOT-,,0, startKey: <>}

        complete

        Show
        Ted Yu added a comment - I can observe this problem on master.jsp which shows 6 tables after 2 runs. But HBase shell shows correct number of tables: 14 It is clear from master log that although RegionManager.metaScanner locates .META. table on 10.32.56.155, the cache in HConnectionManager$TableServers is stale. Refreshing master.jsp doesn't give the correct table count. Here is snippet from master log: 2010-07-28 15:19:10,974 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=60, maxHeap=3991) 1 regions 2010-07-28 15:19:10,974 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 to sjc9-flash-grid02.carrieriq.com,60020,1280346234926 2010-07-28 15:19:10,976 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 from sjc9-flash-grid02.carrieriq.com,60020,1280346234926; 1 of 1 2010-07-28 15:19:10,976 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: PendingOpenOperation from sjc9-flash-grid02.carrieriq.com,60020,1280346234926 2010-07-28 15:19:10,976 INFO org.apache.hadoop.hbase.master.RegionServerOperation: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 open on 10.32.56.156:60020 2010-07-28 15:19:10,977 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 in region .META.,,1 with startcode=1280346234926, server=10.32.56.156:60020 2010-07-28 15:20:08,621 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for row <> in tableName .META.: location server 10.32.56.157:60020, location region name .META.,,1 2010-07-28 15:20:09,875 INFO org.apache.hadoop.hbase.master.ServerManager: 3 region servers, 0 dead, average load 14.333333333333334 2010-07-28 15:20:10,001 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.32.56.155:60020, regionname: .META.,,1, startKey: <>} 2010-07-28 15:20:10,016 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 is not valid; serverAddress=10.32.56.156:60020, startCode=1280346234926 unknown. 2010-07-28 15:20:10,021 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 14 row(s) of meta region {server: 10.32.56.155:60020, regionname: .META.,,1, startKey: <>} complete 2010-07-28 15:20:10,022 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned 2010-07-28 15:20:10,162 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=45, maxHeap=3991): total nregions to assign=1, regions to give other servers than this=0, isMetaAssign=false 2010-07-28 15:20:10,162 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning address: 10.32.56.156:60020, startcode: 1280346234926, load: (requests=0, regions=14, usedHeap=45, maxHeap=3991) 1 regions 2010-07-28 15:20:10,162 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 to sjc9-flash-grid02.carrieriq.com,60020,1280346234926 2010-07-28 15:20:10,164 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 from sjc9-flash-grid02.carrieriq.com,60020,1280346234926; 1 of 1 2010-07-28 15:20:10,164 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: PendingOpenOperation from sjc9-flash-grid02.carrieriq.com,60020,1280346234926 2010-07-28 15:20:10,164 INFO org.apache.hadoop.hbase.master.RegionServerOperation: 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 open on 10.32.56.156:60020 2010-07-28 15:20:10,166 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row 2__HB_NOINC_ORCL_SQLLDR_0728-DIMENSIONS-1280352660291-0,,1280352695476 in region .META.,,1 with startcode=1280346234926, server=10.32.56.156:60020 2010-07-28 15:20:10,810 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.32.56.155:60020, regionname: -ROOT-,,0, startKey: <>} 2010-07-28 15:20:10,813 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {server: 10.32.56.155:60020, regionname: -ROOT-,,0, startKey: <>} complete
        Hide
        Ted Yu added a comment -

        clearRegionCache() which is in trunk should be added to HConnectionManager.java

        Show
        Ted Yu added a comment - clearRegionCache() which is in trunk should be added to HConnectionManager.java
        Hide
        Andrew Purtell added a comment -

        Reopen or file new issue if still relevant with modern HBase versions

        Show
        Andrew Purtell added a comment - Reopen or file new issue if still relevant with modern HBase versions
        Andrew Purtell made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Cannot Reproduce [ 5 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        1449d 20h 32m 1 Andrew Purtell 16/Jul/14 23:21

          People

          • Assignee:
            Unassigned
            Reporter:
            Ted Yu
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development