Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-18549

Unclaimed replication queues can go undetected

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Hadoop Flags:
      Reviewed

      Description

      We have come across this situation multiple times where a zookeeper issues can cause NodeFailoverWorker to fail picking up replication queue for a dead region server silently. One example is when the znode size for a particular queue exceed jute.maxBuffer value.

      There can be other situations that may lead to this and just go undetected. We need to have a metric for number of unclaimed replication queues. This will help in mitigating the problem through alerting on the metric and identifying underlying issues.

        Attachments

        1. HBASE-18549.branch-1.001.patch
          10 kB
          Xu Cang
        2. HBASE-18549.branch-1.001.patch
          10 kB
          Xu Cang
        3. HBASE-18549-.master.001.patch
          10 kB
          Xu Cang
        4. HBASE-18549-.master.002.patch
          13 kB
          Xu Cang
        5. HBASE-18549-.master.003.patch
          12 kB
          Xu Cang
        6. HBASE-18549-.master.004.patch
          12 kB
          Xu Cang

          Activity

            People

            • Assignee:
              xucang Xu Cang
              Reporter:
              ashu210890 Ashu Pachauri

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment