Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-13146

Race Condition in ScheduledChore and ChoreService

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.0, 2.0.0
    • 1.1.0, 2.0.0
    • regionserver
    • None
    • Reviewed

    Description

      Here is my findings when addressing HBASE-13145.

      ChoreService.java
        public synchronized boolean scheduleChore(ScheduledChore chore) {
            ...
            ScheduledFuture<?> future =
                scheduler.scheduleAtFixedRate(chore, chore.getInitialDelay(), chore.getPeriod(),
                  chore.getTimeUnit());
            chore.setChoreServicer(this);
            ...
        }
      

      So we schedule the chore first, and then set chore servicer. And for CompactionChecker, the initialDelay is 0, so it is possible that the chore is run before we set chore servicer for it. And see this

      ScheduledChore.java
        public void run() {
          ...
          else if (stopper.isStopped() || !isScheduled()) {
            cancel(false);
            cleanup();
            if (LOG.isInfoEnabled()) LOG.info("Chore: " + getName() + " was stopped");
          }
          ...
        }
          ...
        public synchronized boolean isScheduled() {
          return choreServicer != null && choreServicer.isChoreScheduled(this);
        }
      

      So it is possible that isScheduled() returns false and we start to cancel the chore. You can insert a sleep between scheduled chore and set chore servicer, then you can always get the log ' Chore: CompactionChecker was stopped'. But it does not always actually cancel the chore because the cancel method's implementation.

      ScheduledChore.java
        public synchronized void cancel(boolean mayInterruptIfRunning) {
          if (isScheduled()) choreServicer.cancelChore(this, mayInterruptIfRunning);
      
          choreServicer = null;
        }
      

      So if you insert a sleep before cancel(remember to set a larger sleep time here), then you can make the test always fail.

      Attachments

        1. HBASE-13146.patch
          1 kB
          Duo Zhang

        Activity

          People

            zhangduo Duo Zhang
            zhangduo Duo Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: