[HBASE-15937] Figure out retry limit and timing for replication queue table operations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Replication
Labels:
None

Release Note:
Retrying tests

Description

ReplicationQueuesHBaseImpl will abort the server if any of its HBase Table writes/reads fails. We should figure out a reasonable retry limit and pause duration for these operations.

As of now the timeouts look like:

Table initialization:
240 retries
1 minute pause (because the Master may not be initialized yet, createTable retries are immediately rejected by PleaseHoldException, so we should sleep in between RPC requests)
1 minute RPC timeouts
Total: At minimum 2 hours of retries

Normal Replication Table operations:
240 retries
100 millis pause (because we assume the cluster is in a more stable state, we assume most exceptions will be RPC timeouts, so I am using the standard RPC pause)
1 minute RPC timeouts
Total: Assuming operations fail because of RPC timeouts, a minimum of 2 hours of retries. With just pauses we only have 24 seconds.

All of these timeouts are configurable too though.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-15937.patch
03/Aug/16 17:53
25 kB
Joseph

Activity

People

Assignee:: Ashu Pachauri

Reporter:: Joseph

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Jun/16 01:14

Updated:: 17/May/23 02:54

Resolved:: 17/May/23 02:54