Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20632

Failure of RSes belonging to RSgroup for System tables makes the cluster unavailable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Abandoned
    • 3.0.0-alpha-1
    • None
    • master, regionserver

    Description

      This was done on a local cluster (non hdfs) and following are the steps

      • Start a single node cluster and start an additional RS using local-regionservers.sh
      • Through hbase shell add a new rs group
      • hbase(main):001:0> add_rsgroup 'test_rsgroup'
        Took 0.5503 seconds
        hbase(main):002:0> list_rsgroups
        NAME SERVER / TABLE
        test_rsgroup
        default server dob2-r3n13:16020
        server dob2-r3n13:16022
        table hbase:meta
        table hbase:acl
        table hbase:quota
        table hbase:namespace
        table hbase:rsgroup
        2 row(s)
        Took 0.0419 seconds
      • Move one of the region servers to the new rsgroup
      • hbase(main):004:0> move_servers_rsgroup 'test_rsgroup',['dob2-r3n13:16020']
        Took 6.4894 seconds
        hbase(main):005:0> exit
      • Stop the regionserver which is left in the default rsgroup
      • local-regionservers.sh stop 2

      The cluster becomes unusable even if the region server is restarted or even if all the services were brought down and brought up.

      In 1.1.x version, the cluster recovers fine. Looks like meta is assigned to a dummy regionserver and when the regionserver gets restarted it gets assigned. The following is what we can see in master UI when the rs is down

      1588230740	hbase:meta,,1.1588230740 state=PENDING_OPEN, ts=Wed May 23 18:24:01 EDT 2018 (1s ago), server=localhost,1,1

      Attachments

        1. 20632.v1.txt
          1 kB
          Ted Yu

        Activity

          People

            Unassigned Unassigned
            gsbiju Biju Nair
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: