Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-20172

Accord: Fix various bugs, improve burn test reliability

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • None
    • Accord
    • None

    Description

      • Fix notifying unmanaged after update redundant before/bootstrap
      • Do not infer invalid if we have a single round of replies with minKnown not decided and maxKnown erased - in this case store the knowledge for next request.
      • Fix SyncPoint topology selection
      • Fix CheckStatusOkFull.with(InvalidIf)
      • Fix NotifyWaitingOn
      • ExecuteTxn should only contact latest topology for follow-up requests
      • DurableBefore.min should not go backwards on new epoch topology, journal replay was not correctly handling PreApplied, partialTxn can be null if not owned
      • Fix notify pre-bootstrap that arrives post-bootstrap
      • Avoid GC race condition on Propagate where we can incorrectly infer a shard is stale
      • Ensure redundantBefore on previously-owned range does not imply redundant before for overlapping queries on still-owned range
      • Ensure we don't mark stale unless all of the quorum we contacted had erased, else we may have raced with the agreement and erase
      • Fix Invalidate when no route found for FetchData does not report to all requested local epochs
      • Fix WAS_OWNED_RETIRED without durableBefore at Universal can lead to assertions with RX that we permit to execute but that have not yet
      • Fix initialiseWaitingOn can in some cases transitively notify the command we're updating via maybeCleanup of dependencies, but the command isn't yet updated so isn't ready
      • Fix encountering a command that is pre-bootstrap, and for which we have locally 'applied' a supserseding RX, so that we do not know its outcome locally (so we do not cleanup the command), but also it must have been decided - and we should
        not respond with future dependencies.
      • Epoch failures on CoordinatePreAccept should trigger the CoordinatePreAccept failure handler
      • Use the shard bound rather than GC bound for fallback dependency
      • LatestDeps should be sliced to actual route, so as not to use both PreAccepted AND Stable deps as though Stable
      • Fix various callback issues with node.withEpoch and Recover/Propose.isDone
      • RecoverWithRoute can encounter a partially truncated transaction where the Deps for one shard are not committed. Must fetch LatestDeps.
      • Tighten LatestDeps semantics for Recover
      • CommandsForKey: do not restore pruned as APPLIED
      • Ensure prune points execute in the epoch in which they are declared
      • must merge all fast path votes including those from earlier epochs that may have witnessed a later transaction
      • Recoveries that know the transaction is committed a priori should skip the Accept phase
      • Maintain GC behaviour for redundant commands that are pre-bootstrap
      • don't apply ERASE to CommandsForKey to avoid breaking pruning
      • Introduce clearBefore to ProgressLog to more consistently handle cleaning up redundant transactions (and avoid triggering burn test invariants)
      • don't replay journal of a bootstrapping node in burn test
      • Recover, Accept or Commit reply from epoch that has been retired should be treated as Success rather than Redundant
      • Distinguish completely REDUNDANT+PRE_BOOTSTRAP from partially GC_BEFORE and REDUNDANT+PRE_BOOTSTRAP - latter can make stronger inferences based on the GC_BEFORE intersection (could perhaps be treated as simply GC_BEFORE)
      • RX must register historical transactions with CFK
      • CommandStore.bootstrapper must wait for coordinate sync via same mechanism as sync()
      • Don't start topology change for shard where all replicas are already bootstrapping
      • Reify executes et al in StoreParticipants
      • LocalListeners txn listener reentry may erase the entry entirely
      • use registerAt in AbstractRequest for expirations, use correct time for expiresAt in ListAgent
      • use txnId.epoch() for pruning, as must be before both txnId and executeAt of prune point for coordinating dependencies
      • compute accurate KnownMap when affected by bootstrap or staleness
      • upgradeTruncated should calculate Definition and Deps separately
      • Invalidate should not sort before Erased when calculating max reply or max knowledge reply
      • avoid another infinite loop at end of burn test
      • avoid another epoch loading edge case
      • pass through low/high epochs to ensure we propagate information to all waiting command stores
      • RX must adopt a non-pruned dependency that has a higher TxnId (if is itself behind prune point)
      • rejects should also be calculated on COMMITTED started before
      • remove Apply Factory wrapper for RX, redundant now we have CoordinationAdapters (and has faulty epoch logic)
      • for RX ensure we return maximum writes for each epoch we intersect (same effectively as pruning logic)
      • rework updateUnmanaged to improve clarity
      • BeginRecovery constructor of LatestDeps should use touches() not owns() for compute localDeps
      • BeginRecovery superseding calculation was incorrectly treating startedBefore Committed and Accepted the same, when the point at which a dep should be known differs
      • Refactor Command visiting, porting C* integration to accord-core
      • RelationMultiMap Builder should resize keys and keyLimits independently
      • CommandsForKey Serialization moved to accord-core
      • losing ownership of range should trigger re-registration of unmanaged waiting on commit of a no-longer owned txn

      Attachments

        1. ci_summary.html
          965 kB
          Benedict Elliott Smith

        Activity

          People

            Unassigned Unassigned
            benedict Benedict Elliott Smith
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: