Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
None
-
None
-
None
Description
To ensure that WALs are not left in a dangling "open" state WRT replication, the garbage collector scans the tablets and constructs a view of WALs that are currently in use. It consults that view to determine which WALs can move to a "closed" replication state.
This isn't entirely correct because a WAL can "come back" again after being removed from a Tablet. Consider the following:
- Table has one tablet hosted on one tserver
- Tablet gets some mutations
- Tablet gets MinC
- Tablet removes WAL entry as part of MinC
- WAL is "closed" WRT replication
- Tablet receives more mutations, starts using the same WAL
There are a couple of ways that this could present itself, each of which would result in re-replication of data we've potentially already sent once. On an active system, I don't think this is of big concern, and we already don't guarantee a "once and only once" replication contract so this isn't critical. The combiner set on the replication table will also mitigate most of the re-replication concerns as those records persist until the entire file is replicated (which should outlast the use on the local system).
ecn recommended that we could record a "closed" marker for a WAL as a part of TabletServerLogger.close() which would prevent the need to "guess" at when a WAL will no longer be used.
If we want to move to explicit "end" tracking (see ACCUMULO-2835), we will need this implemented.
Attachments
Issue Links
- blocks
-
ACCUMULO-2835 Track explicit lengths of WALs for more accurate reporting
- Resolved