Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-10078

Node failure during concurrent partition updates may cause partition desync between primary and backup.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8
    • None
    • None
    • Docs Required

    Description

      This is possible if some updates are not written to WAL before node failure. They will be not applied by rebalancing due to same partition counters in certain scenario:

      1. Start grid with 3 nodes, 2 backups.
      2. Preload some data to partition P.
      3. Start two concurrent transactions writing single key to the same partition P, keys are different

      try(Transaction tx = client.transactions().txStart(PESSIMISTIC, REPEATABLE_READ, 0, 1)) {
            client.cache(DEFAULT_CACHE_NAME).put(k, v);
      
            tx.commit();
      }
      

      4. Order updates on backup in the way such update with max partition counter is written to WAL and update with lesser partition counter failed due to triggering of FH before it's added to WAL

      5. Return failed node to grid, observe no rebalancing due to same partition counters.

      Possible solution: detect gaps in update counters on recovery and force rebalance from a node without gaps if detected.

      Attachments

        Issue Links

          Activity

            People

              ascherbakov Alexey Scherbakov
              ascherbakov Alexey Scherbakov
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h