Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-20441

ItRebalanceRecoveryTest is flaky

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • None
    • None
    • None

    Description

      org.apache.ignite.internal.rebalance.ItRebalanceRecoveryTest is flaky on TC, see TC history.

      After some invsetigation, I found out that the problem is the following:

      1. Imagine two threads: A and B.
      2. Thread A executes an update from the test. Primary replica generates a timestamp, creates a Raft command and tries to apply it, but the thread stalls for any reason.
      3. Thread B performs idle safe time sync. Primary replica generates a timestamp (larger than the timestamp from the previous step), creates a Raft command and successfully applies it.
      4. Thread A resumes its execution and applies its command. This means that the Raft command from thread B will be applied before the command from thread A, despite their timestamps being ordered differently.
      5. According to the test protocol, a node gets restarted and needs to apply the missing Raft log part, which means re-applying the two commands above. However, the second command (which inserts data) will be ignored, because there's code in PartitionListener that ignores storage updates if their timestamp is smaller than the current timestamp (which got updated by the first command).

      Therefore, this bug is definitely caused by IGNITE-20116

      Attachments

        Issue Links

          Activity

            People

              apolovtcev Aleksandr Polovtsev
              apolovtcev Aleksandr Polovtsev
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: