Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-18549

Unclaimed replication queues can go undetected

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0, 2.2.0, 1.4.8, 2.1.1
    • Component/s: Replication
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We have come across this situation multiple times where a zookeeper issues can cause NodeFailoverWorker to fail picking up replication queue for a dead region server silently. One example is when the znode size for a particular queue exceed jute.maxBuffer value.

      There can be other situations that may lead to this and just go undetected. We need to have a metric for number of unclaimed replication queues. This will help in mitigating the problem through alerting on the metric and identifying underlying issues.

        Attachments

        1. HBASE-18549-.master.004.patch
          12 kB
          Xu Cang
        2. HBASE-18549-.master.003.patch
          12 kB
          Xu Cang
        3. HBASE-18549-.master.002.patch
          13 kB
          Xu Cang
        4. HBASE-18549-.master.001.patch
          10 kB
          Xu Cang
        5. HBASE-18549.branch-1.001.patch
          10 kB
          Xu Cang
        6. HBASE-18549.branch-1.001.patch
          10 kB
          Xu Cang

          Activity

            People

            • Assignee:
              xucang Xu Cang
              Reporter:
              ashu210890 Ashu Pachauri
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: