Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15867 Move HBase replication tracking from ZooKeeper to HBase
  3. HBASE-15937

Figure out retry limit and timing for replication queue table operations

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Replication
    • None
    • Retrying tests

    Description

      ReplicationQueuesHBaseImpl will abort the server if any of its HBase Table writes/reads fails. We should figure out a reasonable retry limit and pause duration for these operations.

      As of now the timeouts look like:

      Table initialization:
      240 retries
      1 minute pause (because the Master may not be initialized yet, createTable retries are immediately rejected by PleaseHoldException, so we should sleep in between RPC requests)
      1 minute RPC timeouts
      Total: At minimum 2 hours of retries

      Normal Replication Table operations:
      240 retries
      100 millis pause (because we assume the cluster is in a more stable state, we assume most exceptions will be RPC timeouts, so I am using the standard RPC pause)
      1 minute RPC timeouts
      Total: Assuming operations fail because of RPC timeouts, a minimum of 2 hours of retries. With just pauses we only have 24 seconds.

      All of these timeouts are configurable too though.

      Attachments

        1. HBASE-15937.patch
          25 kB
          Joseph

        Activity

          People

            ashu210890 Ashu Pachauri
            Vegetable26 Joseph
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: