Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-22759

Do not do partition SafeTime sync if previous attempt is not finished

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0
    • None

    Description

      There is a scheduled task that, periodically, does 'partition SafeTime sync' on each primary replicas living on the node. For each such a replica, we do the following:

      1. Take current time from theĀ  node clock ('now')
      2. Wait till the Metastorage SafeTime reaches 'now'
      3. Make sure the replica is still primary
      4. Execute the partition SafeTime sync logic

      Step 2 is implemented by installing a future to a PendingComparableValuesTracker representing the Metastorage SafeTime. If, for some reason, Metastorage SafeTime lags behind the node clock, a few (or many) futures might be installed at the same time for the same partition. When there are many partitions, this leads to huge number of futures, most of which are useless (just one [the most recent] of them makes sense for each partition). This increases the amount of garbage. If the node is already struggling to chew the load, this will finish the node off as it will increase the GC pressure drastically. The node will choke itself to OutOfMemory situation.

      It is suggested to only execute steps 1-4 if previous future has already finished. We might lose one partition SafeTime update, but in a situation when the node is already struggling (as Metastorage SafeTime lags) this will probably not be noticed.

      Update: this approach was critisized, another one is tried https://issues.apache.org/jira/browse/IGNITE-22759?focusedCommentId=17895965&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17895965

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rpuch Roman Puchkovskiy
            rpuch Roman Puchkovskiy
            Alexander Lapin Alexander Lapin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h
                4h

                Slack

                  Issue deployment