Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-809

Statestore seems send concurrent heartbeats to the same subscriber leading to repeated "Subscriber '<hostname>' is registering with statestore, ignoring update" messages

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.2.3, Impala 1.2.4
    • Impala 1.2.4, Impala 1.3
    • None
    • None

    Description

      The Statestore seems send concurrent heartbeats to the same subscriber.

      On the subscriber side, we see:

      1) The subscriber fails and enters recovery mode
      2) At some later point, the subscriber successfully exits recovery mode and re-registers with the statestore.

      Things should be good at this point, but in the subscriber log we see the error message "Subscriber 'c2108.hal.cloudera.com:22000' is registering with statestore, ignoring update" repeatedly (indefinitely).

      In the statestore log (/var/log/statestore/statestored.c2102.hal.cloudera.com.impala.log.INFO.20140214-154014.15224) the following message are also repeated indefinitely.

      I0214 16:01:37.447760 15327 statestore.cc:539] Unable to update subscriber at c2108.hal.cloudera.com:23000, received error Subscriber 'c2108.hal.cloudera.com:22000' is registering with statestore, ignoring update.
      I0214 16:01:37.451879 15327 statestore.cc:539] Unable to update subscriber at c2136.hal.cloudera.com:23000, received error Subscriber 'c2136.hal.cloudera.com:22000' is registering with statestore, ignoring update.
      I0214 16:01:37.558199 15333 statestore.cc:539] Unable to update subscriber at c2104.hal.cloudera.com:23000, received error Subscriber 'c2104.hal.cloudera.com:22000' is registering with statestore, ignoring update.
      I0214 16:01:37.563241 15333 statestore.cc:539] Unable to update subscriber at c2116.hal.cloudera.com:23000, received error Subscriber 'c2116.hal.cloudera.com:22000' is registering with statestore, ignoring update.
      I0214 16:01:37.572443 15333 statestore.cc:539] Unable to update subscriber at c2128.hal.cloudera.com:23000, received error Subscriber 'c2128.hal.cloudera.com:22000' is registering with statestore, ignoring update.
      I0214 16:01:37.596616 15328 statestore.cc:539] Unable to update subscriber at c2114.hal.cloudera.com:23000, received error Subscriber 'c2114.hal.cloudera.com:22000' is registering with statestore, ignoring update.
      I0214 16:01:37.619920 15333 statestore.cc:539] Unable to update subscriber at c2132.hal.cloudera.com:23000, received error Subscriber 'c2132.hal.cloudera.com:22000' is registering with statestore, ignoring update.
      I0214 16:01:37.624423 15336 statestore.cc:539] Unable to update subscriber at c2110.hal.cloudera.com:23000, received error Subscriber 'c2110.hal.cloudera.com:22000' is registering with statestore, ignoring update.
      I0214 16:01:37.680075 15332 statestore.cc:539] Unable to update subscriber at c2126.hal.cloudera.com:23000, received error Subscriber 'c2126.hal.cloudera.com:22000' is registering with statestore, ignoring update.
      

      This is coming from state-store-subscriber.cc/310. The only time that this can happen is if we are currently in RecoveryMode (which we are not - verified by attaching with gdb) or if there is a concurrent call to UpdateState.

      Status StatestoreSubscriber::UpdateState(const TopicDeltaMap& incoming_topic_deltas,
        // We don't want to block here because this is an RPC, and delaying the return causes    
        // the statestore to delay sending the next batch of heartbeats. The only time that      
        // lock_ will be taken once UpdateState() might be called is in RecoveryModeChecker();   
        // if we're in recovery mode we don't want to process the update.                        
        try_mutex::scoped_try_lock l(lock_);                                                     
        if (l) {
          {
             // Process heartbeat...
        } else {
          stringstream ss;
          ss << "Subscriber '" << subscriber_id_                                                 
             << "' is registering with statestore, ignoring update.";                            
          return Status(ss.str());                                                               
        }
      

      Attachments

        Activity

          People

            henryr Henry Robinson
            lskuff Lenni Kuff
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: