Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 1.2.3, Impala 1.2.4
-
None
-
None
Description
The Statestore seems send concurrent heartbeats to the same subscriber.
On the subscriber side, we see:
1) The subscriber fails and enters recovery mode
2) At some later point, the subscriber successfully exits recovery mode and re-registers with the statestore.
Things should be good at this point, but in the subscriber log we see the error message "Subscriber 'c2108.hal.cloudera.com:22000' is registering with statestore, ignoring update" repeatedly (indefinitely).
In the statestore log (/var/log/statestore/statestored.c2102.hal.cloudera.com.impala.log.INFO.20140214-154014.15224) the following message are also repeated indefinitely.
I0214 16:01:37.447760 15327 statestore.cc:539] Unable to update subscriber at c2108.hal.cloudera.com:23000, received error Subscriber 'c2108.hal.cloudera.com:22000' is registering with statestore, ignoring update. I0214 16:01:37.451879 15327 statestore.cc:539] Unable to update subscriber at c2136.hal.cloudera.com:23000, received error Subscriber 'c2136.hal.cloudera.com:22000' is registering with statestore, ignoring update. I0214 16:01:37.558199 15333 statestore.cc:539] Unable to update subscriber at c2104.hal.cloudera.com:23000, received error Subscriber 'c2104.hal.cloudera.com:22000' is registering with statestore, ignoring update. I0214 16:01:37.563241 15333 statestore.cc:539] Unable to update subscriber at c2116.hal.cloudera.com:23000, received error Subscriber 'c2116.hal.cloudera.com:22000' is registering with statestore, ignoring update. I0214 16:01:37.572443 15333 statestore.cc:539] Unable to update subscriber at c2128.hal.cloudera.com:23000, received error Subscriber 'c2128.hal.cloudera.com:22000' is registering with statestore, ignoring update. I0214 16:01:37.596616 15328 statestore.cc:539] Unable to update subscriber at c2114.hal.cloudera.com:23000, received error Subscriber 'c2114.hal.cloudera.com:22000' is registering with statestore, ignoring update. I0214 16:01:37.619920 15333 statestore.cc:539] Unable to update subscriber at c2132.hal.cloudera.com:23000, received error Subscriber 'c2132.hal.cloudera.com:22000' is registering with statestore, ignoring update. I0214 16:01:37.624423 15336 statestore.cc:539] Unable to update subscriber at c2110.hal.cloudera.com:23000, received error Subscriber 'c2110.hal.cloudera.com:22000' is registering with statestore, ignoring update. I0214 16:01:37.680075 15332 statestore.cc:539] Unable to update subscriber at c2126.hal.cloudera.com:23000, received error Subscriber 'c2126.hal.cloudera.com:22000' is registering with statestore, ignoring update.
This is coming from state-store-subscriber.cc/310. The only time that this can happen is if we are currently in RecoveryMode (which we are not - verified by attaching with gdb) or if there is a concurrent call to UpdateState.
Status StatestoreSubscriber::UpdateState(const TopicDeltaMap& incoming_topic_deltas, // We don't want to block here because this is an RPC, and delaying the return causes // the statestore to delay sending the next batch of heartbeats. The only time that // lock_ will be taken once UpdateState() might be called is in RecoveryModeChecker(); // if we're in recovery mode we don't want to process the update. try_mutex::scoped_try_lock l(lock_); if (l) { { // Process heartbeat... } else { stringstream ss; ss << "Subscriber '" << subscriber_id_ << "' is registering with statestore, ignoring update."; return Status(ss.str()); }