[CASSANDRA-20172] Accord: Fix various bugs, improve burn test reliability - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: None
Component/s: Accord
Labels:
None

Epic Link:
CEP-15: General Purpose Transactions
Platform:

All
Impacts:

None

Description

Fix notifying unmanaged after update redundant before/bootstrap
Do not infer invalid if we have a single round of replies with minKnown not decided and maxKnown erased - in this case store the knowledge for next request.
Fix SyncPoint topology selection
Fix CheckStatusOkFull.with(InvalidIf)
Fix NotifyWaitingOn
ExecuteTxn should only contact latest topology for follow-up requests
DurableBefore.min should not go backwards on new epoch topology, journal replay was not correctly handling PreApplied, partialTxn can be null if not owned
Fix notify pre-bootstrap that arrives post-bootstrap
Avoid GC race condition on Propagate where we can incorrectly infer a shard is stale
Ensure redundantBefore on previously-owned range does not imply redundant before for overlapping queries on still-owned range
Ensure we don't mark stale unless all of the quorum we contacted had erased, else we may have raced with the agreement and erase
Fix Invalidate when no route found for FetchData does not report to all requested local epochs
Fix WAS_OWNED_RETIRED without durableBefore at Universal can lead to assertions with RX that we permit to execute but that have not yet
Fix initialiseWaitingOn can in some cases transitively notify the command we're updating via maybeCleanup of dependencies, but the command isn't yet updated so isn't ready
Fix encountering a command that is pre-bootstrap, and for which we have locally 'applied' a supserseding RX, so that we do not know its outcome locally (so we do not cleanup the command), but also it must have been decided - and we should
not respond with future dependencies.
Epoch failures on CoordinatePreAccept should trigger the CoordinatePreAccept failure handler
Use the shard bound rather than GC bound for fallback dependency
LatestDeps should be sliced to actual route, so as not to use both PreAccepted AND Stable deps as though Stable
Fix various callback issues with node.withEpoch and Recover/Propose.isDone
RecoverWithRoute can encounter a partially truncated transaction where the Deps for one shard are not committed. Must fetch LatestDeps.
Tighten LatestDeps semantics for Recover
CommandsForKey: do not restore pruned as APPLIED
Ensure prune points execute in the epoch in which they are declared
must merge all fast path votes including those from earlier epochs that may have witnessed a later transaction
Recoveries that know the transaction is committed a priori should skip the Accept phase
Maintain GC behaviour for redundant commands that are pre-bootstrap
don't apply ERASE to CommandsForKey to avoid breaking pruning
Introduce clearBefore to ProgressLog to more consistently handle cleaning up redundant transactions (and avoid triggering burn test invariants)
don't replay journal of a bootstrapping node in burn test
Recover, Accept or Commit reply from epoch that has been retired should be treated as Success rather than Redundant
Distinguish completely REDUNDANT+PRE_BOOTSTRAP from partially GC_BEFORE and REDUNDANT+PRE_BOOTSTRAP - latter can make stronger inferences based on the GC_BEFORE intersection (could perhaps be treated as simply GC_BEFORE)
RX must register historical transactions with CFK
CommandStore.bootstrapper must wait for coordinate sync via same mechanism as sync()
Don't start topology change for shard where all replicas are already bootstrapping
Reify executes et al in StoreParticipants
LocalListeners txn listener reentry may erase the entry entirely
use registerAt in AbstractRequest for expirations, use correct time for expiresAt in ListAgent
use txnId.epoch() for pruning, as must be before both txnId and executeAt of prune point for coordinating dependencies
compute accurate KnownMap when affected by bootstrap or staleness
upgradeTruncated should calculate Definition and Deps separately
Invalidate should not sort before Erased when calculating max reply or max knowledge reply
avoid another infinite loop at end of burn test
avoid another epoch loading edge case
pass through low/high epochs to ensure we propagate information to all waiting command stores
RX must adopt a non-pruned dependency that has a higher TxnId (if is itself behind prune point)
rejects should also be calculated on COMMITTED started before
remove Apply Factory wrapper for RX, redundant now we have CoordinationAdapters (and has faulty epoch logic)
for RX ensure we return maximum writes for each epoch we intersect (same effectively as pruning logic)
rework updateUnmanaged to improve clarity
BeginRecovery constructor of LatestDeps should use touches() not owns() for compute localDeps
BeginRecovery superseding calculation was incorrectly treating startedBefore Committed and Accepted the same, when the point at which a dep should be known differs
Refactor Command visiting, porting C* integration to accord-core
RelationMultiMap Builder should resize keys and keyLimits independently
CommandsForKey Serialization moved to accord-core
losing ownership of range should trigger re-registration of unmanaged waiting on commit of a no-longer owned txn

Accord: Fix various bugs, improve burn test reliability

Details

Description

Attachments

Attachments

Activity

People

Dates