Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-7525

A canary monitoring program specifically for regionserver

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.94.0
    • 0.98.0, 0.96.1
    • monitoring
    • None
    • Hide
      Tool to check cluster. See $ ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help for how to use.

      {code}
      Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table/regionserver 1 [table/regionserver 2...]]
        where [opts] are:
          -help Show this help and exit.
          -regionserver replace the table argument to regionserver, which means to enable regionserver mode
          -daemon Continuous check at defined intervals.
          -interval <N> Interval between checks (sec)
          -e Use region/regionserver as regular expression which means the region/regionserver is regular expression pattern
          -f <B> stop whole program if first error occurs, default is true -t <N> timeout for a check, default is 600000 (milisecs)
      {code}
      Show
      Tool to check cluster. See $ ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help for how to use. {code} Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table/regionserver 1 [table/regionserver 2...]]   where [opts] are:     -help Show this help and exit.     -regionserver replace the table argument to regionserver, which means to enable regionserver mode     -daemon Continuous check at defined intervals.     -interval <N> Interval between checks (sec)     -e Use region/regionserver as regular expression which means the region/regionserver is regular expression pattern     -f <B> stop whole program if first error occurs, default is true -t <N> timeout for a check, default is 600000 (milisecs) {code}

    Description

      Motivation
      This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows
      1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them
      2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any.

      example
      1. the tool docs
      ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help
      Usage: [opts] [regionServerName 1 [regionServrName 2...]]
      regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName
      where [-opts] are:
      -help Show this help and exit.
      -e Use regionServerName as regular expression
      which means the regionServerName is regular expression pattern
      -f <B> stop whole program if first error occurs, default is true
      -t <N> timeout for a check, default is 600000 (milisecs)
      -daemon Continuous check at defined intervals.
      -interval <N> Interval between checks (sec)

      2. Will send a request to each regionserver in a HBase cluster
      ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary

      3. Will send a request to a regionserver by given name
      ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname

      4. Will send a request to regionserver(s) by given regular-expression
      /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern
      // another example
      ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]{1,2}.client.tw.trendnet.org

      5. Will send a request to a regionserver and also set a timeout limit for this test
      // query regionserver:rs1.domainname with timeout limit 10sec
      // -f false, means that will not exit this program even test failed
      ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 10000 rs1.domainname
      // echo "1" if timeout
      echo "$?"

      6. Will run as daemon mode, which means it will send request to each regionserver periodically
      ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon

      Attachments

        1. HBASE-7525-v0.patch
          15 kB
          takeshi.miao
        2. RegionServerCanary.java
          13 kB
          takeshi.miao
        3. HBASE-7525-0.95-v0.patch
          22 kB
          takeshi.miao
        4. HBASE-7525-0.95-v1.patch
          34 kB
          takeshi.miao
        5. HBASE-7525-0.95-v3.patch
          23 kB
          takeshi.miao
        6. HBASE-7525-0.95-v4.patch
          23 kB
          takeshi.miao
        7. HBASE-7525-trunk-v2.patch
          24 kB
          takeshi.miao
        8. HBASE-7525-0.95-v6.patch
          24 kB
          takeshi.miao
        9. HBASE-7525-trunk-v3.patch
          25 kB
          takeshi.miao
        10. HBASE-7525-0.95-v7.patch
          25 kB
          takeshi.miao
        11. HBASE-7525-trunk-v4.patch
          25 kB
          takeshi.miao

        Issue Links

          Activity

            People

              takeshi.miao takeshi.miao
              takeshi.miao takeshi.miao
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: