Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Cannot Reproduce
-
None
-
None
-
None
Description
I was talking to Enis Soztutar today about common replication problems across HBase and Accumulo and he was telling me about the following:
A tablet is hosted by tserver1 using WAL1. That tablet moves to a different tserver for whatever reason (tserver1 failed, the balancer, etc) and starts getting used by tserver2 with WAL2.
In the simple case of replicating to another Accumulo instance with servers running NTP, this shouldn't be a big concern because the timestamp assigned to the updates will ensure a final consistent view. However, the intermediate view is incorrect. We can do a better job to ensure that we replicate the data in the correct order.
We already know the WALs that are used by a tablet and the time in which that tablet began using it (done by the TabletServer before any updates hit that Tablet) in the metadata table. We can use these records, in addition to the timestamp on the log column entries to determine the correct ordering for this Tablet WRT to all WALs. All the information is present so that the Master can assign the replication work in the correct order.
Some extra bookkeeping would also be required to either keep that log column around longer than the minc or recovery, or to record some additional piece of replication metadata that the master can read from the replication table.
Attachments
Issue Links
- relates to
-
HBASE-9465 Push entries to peer clusters serially
- Closed