Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-12971

Replication stuck due to large default value for replication.source.maxretriesmultiplier

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.0, 0.98.10
    • 1.0.0, 1.1.0
    • None
    • None

    Description

      We are setting in hbase-site the default value of 300 for replication.source.maxretriesmultiplier introduced in HBASE-11964.

      While this value works fine to recover for transient errors with remote ZK quorum from the peer Hbase cluster - it proved to have side effects in the code introduced in HBASE-11367 Pluggable replication endpoint, where the default is much lower (10).
      See:
      1. https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L169
      2. https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L79

      The the two default values are definitely conflicting - when replication.source.maxretriesmultiplier is set in the hbase-site to 300 this will lead to a sleep time of 300*300 (25h!) when a sockettimeout exception is thrown.

      Attachments

        1. 12971-v2.txt
          1 kB
          Lars Hofhansl
        2. 12971.txt
          1 kB
          Lars Hofhansl

        Activity

          People

            larsh Lars Hofhansl
            amuraru Adrian Muraru
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: