Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6658

BackgroundPipelineCreator does not always stop quickly

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • SCM

    Description

      On my laptop, TestSCMSafeModeManager.testSafeModePipelineExitRule() always takes just over 200 seconds. Checking a few PR runs, it does not seem to be slow on PRs, just locally.

      Debugging the code, it seems to hang in the BackgroundPipelineCreator.stop() method, where it is waiting for the tread to join:

        public void stop() {
          if (!running.compareAndSet(true, false)) {
            LOG.warn("{} is not running, just ignore.", THREAD_NAME);
            return;
          }
      
          LOG.info("Stopping {}.", THREAD_NAME);
      
          // in case RatisPipelineUtilsThread is sleeping
          synchronized (monitor) {
            monitor.notifyAll();
          }
      
          try {
            thread.join();  //  ----> Hangs here
          } catch (InterruptedException e) {
            LOG.warn("Interrupted during join {}.", THREAD_NAME);
            Thread.currentThread().interrupt();
          }
        }
      

      It is clearly hanging as the background thread did not exit, and I believe it is because `notify()` is being used to try to exit the thread, when it should really be interrupted. There is a chance that notify is called while the thread is not waiting, and if so, it will just fall back into the wait state and not exit until it wakes up again.

      Attachments

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              sodonnell Stephen O'Donnell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: