Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-11506 Improvements for large scale deletion
  3. HDDS-11714

resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      In case of resetDeletedBlockRetryCount with --all option, scm takes lock and tries to get all the transaction with max retry and then updates DB with 0 count. In some large scale env this count can be huge which can lead to multiple problem.

      i) Lock can lead to block all other normal operation.

      ii) Since message is passed through ratis, which will fail because of size.

      Instead of doing like above we should do this operation in batches to avoid long lock and ratis message size failure.

      Attachments

        Activity

          People

            aryangupta1998 Aryan Gupta
            ashishkr Ashish Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: