Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-18549

Unclaimed replication queues can go undetected

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      We have come across this situation multiple times where a zookeeper issues can cause NodeFailoverWorker to fail picking up replication queue for a dead region server silently. One example is when the znode size for a particular queue exceed jute.maxBuffer value.

      There can be other situations that may lead to this and just go undetected. We need to have a metric for number of unclaimed replication queues. This will help in mitigating the problem through alerting on the metric and identifying underlying issues.

      Attachments

        1. HBASE-18549.branch-1.001.patch
          10 kB
          Xu Cang
        2. HBASE-18549.branch-1.001.patch
          10 kB
          Xu Cang
        3. HBASE-18549-.master.001.patch
          10 kB
          Xu Cang
        4. HBASE-18549-.master.002.patch
          13 kB
          Xu Cang
        5. HBASE-18549-.master.003.patch
          12 kB
          Xu Cang
        6. HBASE-18549-.master.004.patch
          12 kB
          Xu Cang

        Activity

          People

            xucang Xu Cang
            ashu210890 Ashu Pachauri
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: