Status: Resolved
Resolution: Fixed
This issue is caused by the wrong routing spec generated by the coordinator.
Here is the procedure to reproduce it.
1. setting policiesPerBolt = 2, streamsPerBolt = 3, reuseBoltInStreams = true in server config
2. create four policies which has the same partition and consume the same stream
from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.rpc.callqueuelength"]#window.length(2) select site, host, component, metric, min(convert(value, "long")) as minValue group by site, host, component, metric having minValue >= 10000 insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_CALL_QUEUE_EXCEEDS_OUT; from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.rpc.callqueuelength"]#window.length(30) select site, host, component, metric, min(convert(value, "long")) as minValue group by site, host, component, metric having minValue >= 10000 insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_CALL_QUEUE_EXCEEDS_OUT; from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.hastate.failed.count"]#window.length(2) select site, host, component, metric, timestamp, min(value) as minValue group by site, host, component, metric insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_NN_NO_RESPONSE_OUT from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.hastate.failed.count.test"]#window.length(3) select site, host, component, metric, count(value) as cnt group by site, host, component, metric insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_NN_NO_RESPONSE_OUT;
After creating the four policies, the routing spec is
routerSpecs: [ { streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX", partition: { streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX", type: "GROUPBY", columns: [ "site", "host", "component", "metric" ], sortSpec: null }, targetQueue: [ { partition: { streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX", type: "GROUPBY", columns: [ "site", "host", "component", "metric" ], sortSpec: null }, workers: [ { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt9" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt0" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt1" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt2" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt3" } ] }, { partition: { streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX", type: "GROUPBY", columns: [ "site", "host", "component", "metric" ], sortSpec: null }, workers: [ { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt9" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt0" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt1" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt2" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt3" } ] }, { partition: { streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX", type: "GROUPBY", columns: [ "site", "host", "component", "metric" ], sortSpec: null }, workers: [ { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt9" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt0" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt1" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt2" }, { topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX", boltId: "alertBolt3" } ] } ] } ]
and the alert spec is
boltPolicyIdsMap: { alertBolt9: [ "NameNodeWithOneNoResponse", "NameNodeHAHasNoResponse", "CallQueueLengthExceeds30Times", "CallQueueLengthExceeds2Times" ], alertBolt0: [ "NameNodeWithOneNoResponse", "NameNodeHAHasNoResponse", "CallQueueLengthExceeds30Times", "CallQueueLengthExceeds2Times" ], alertBolt1: [ "NameNodeWithOneNoResponse", "NameNodeHAHasNoResponse", "CallQueueLengthExceeds30Times", "CallQueueLengthExceeds2Times" ], alertBolt2: [ "NameNodeWithOneNoResponse", "NameNodeHAHasNoResponse", "CallQueueLengthExceeds30Times", "CallQueueLengthExceeds2Times" ], alertBolt3: [ "NameNodeWithOneNoResponse", "NameNodeHAHasNoResponse", "CallQueueLengthExceeds30Times", "CallQueueLengthExceeds2Times" ] }
3. produce messages into kafka topic 'hadoop_jmx_metrics_sandbox' and trigger NameNodeWithOneNoResponse.
{"timestamp": 1490250963445, "metric": "hadoop.namenode.hastate.failed.count", "component": "namenode", "site": "artemislvs", "value": 0.0, "host": "localhost"}
Then one message is sent three times.
Issue Links
- links to