Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-23105

Data race in aipersist partition destruction

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0
    • persistence

    Description

      CheckpointProgressImpl#onStartPartitionProcessing and CheckpointProgressImpl#onFinishPartitionProcessing don't work as intended for several reasons:

      • There's a race, we could call onFinish before onStart is called in a concurrent thread. This might happen if there's only a handful of dirty pages in each partition and there are more than one checkpoint threads. Basically, this protection doesn't work.
      • Even if that particular race wouldn't exits, this code still doesn't work, because some of pages could be added to pageIdsToRetry map. That map will be processed later, when writePages is finished, manning that we mark unfinished partitions as finished.
      • Due to aforementioned bugs, I didn't bother including these methods to drainCheckpointBuffers. As a result, this method requires a fix too

      Upd:
      The first and second problems can be solved within the IGNITE-23115, when the writing of pages of one partition is made by only one thread, it will be necessary to check.
      After a thoughtful analysis, I found out that there is no race. So I renamed some methods and added documentation to them. And also fix drainCheckpointBuffers.

      Attachments

        Issue Links

          Activity

            People

              ktkalenko@gridgain.com Kirill Tkalenko
              ibessonov Ivan Bessonov
              Ivan Bessonov Ivan Bessonov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h