Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2642

Potential deadlock in statestore error path

    XMLWordPrintableJSON

Details

    Description

      I just noticed this while reading the statestore code: OfferUpdate() takes subscribers_lock_ if the update queue is full, but in one place that lock has already been taken by the caller:

       {
          lock_guard<mutex> l(subscribers_lock_);
          // ... snip ...
          if (state == FailureDetector::FAILED) {
            if (is_heartbeat) {
              // TODO: Consider if a metric to track the number of failures would be useful.
              LOG(INFO) << "Subscriber '" << subscriber->id() << "' has failed, disconnected "
                        << "or re-registered (last known registration ID: " << update.second
                        << ")";
              UnregisterSubscriber(subscriber.get());
            }
          } else {
            // Schedule the next message.
            VLOG(3) << "Next " << (is_heartbeat ? "heartbeat" : "update") << " deadline for: "
                    << subscriber->id() << " is in " << deadline_ms << "ms";
            // vvvvvvvvvvvvvvvvvv oops vvvvvvvvvvvvvvvvvv
            OfferUpdate(make_pair(deadline_ms, subscriber->id()), is_heartbeat ?
                &subscriber_heartbeat_threadpool_ : &subscriber_topic_update_threadpool_);
          }
        }
      

      It's not as scary as it sounds, because if the update queue has > 10000 entries there's something wrong anyway, but we should fix this.

      Attachments

        Activity

          People

            zoram Zoram Thanga
            henryr Henry Robinson
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: