Uploaded image for project: 'Eagle (Retired)'
  1. Eagle (Retired)
  2. EAGLE-971

Duplicated queues are generated under a monitored stream

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • v0.5.0
    • v0.5.0
    • None
    • None

    Description

      This issue is caused by the wrong routing spec generated by the coordinator.
      Here is the procedure to reproduce it.

      1. setting policiesPerBolt = 2, streamsPerBolt = 3, reuseBoltInStreams = true in server config
      2. create four policies which has the same partition and consume the same stream

       from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.rpc.callqueuelength"]#window.length(2) select site, host, component, metric, min(convert(value, "long")) as minValue group by site, host, component, metric having minValue >= 10000 insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_CALL_QUEUE_EXCEEDS_OUT;
      
      from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.rpc.callqueuelength"]#window.length(30) select site, host, component, metric, min(convert(value, "long")) as minValue group by site, host, component, metric having minValue >= 10000 insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_CALL_QUEUE_EXCEEDS_OUT;
      
      from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.hastate.failed.count"]#window.length(2) select site, host, component, metric, timestamp, min(value) as minValue group by site, host, component, metric insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_NN_NO_RESPONSE_OUT
      
      from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric == "hadoop.namenode.hastate.failed.count.test"]#window.length(3) select site, host, component, metric, count(value) as cnt group by site, host, component, metric insert into HADOOP_JMX_METRIC_STREAM_SANDBOX_NN_NO_RESPONSE_OUT;
      

      After creating the four policies, the routing spec is

      routerSpecs: [
      {
      streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX",
      partition: {
      streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX",
      type: "GROUPBY",
      columns: [
      "site",
      "host",
      "component",
      "metric"
      ],
      sortSpec: null
      },
      targetQueue: [
      {
      partition: {
      streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX",
      type: "GROUPBY",
      columns: [
      "site",
      "host",
      "component",
      "metric"
      ],
      sortSpec: null
      },
      workers: [
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt9"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt0"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt1"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt2"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt3"
      }
      ]
      },
      {
      partition: {
      streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX",
      type: "GROUPBY",
      columns: [
      "site",
      "host",
      "component",
      "metric"
      ],
      sortSpec: null
      },
      workers: [
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt9"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt0"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt1"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt2"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt3"
      }
      ]
      },
      {
      partition: {
      streamId: "HADOOP_JMX_METRIC_STREAM_SANDBOX",
      type: "GROUPBY",
      columns: [
      "site",
      "host",
      "component",
      "metric"
      ],
      sortSpec: null
      },
      workers: [
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt9"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt0"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt1"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt2"
      },
      {
      topologyName: "ALERT_UNIT_TOPOLOGY_APP_SANDBOX",
      boltId: "alertBolt3"
      }
      ]
      }
      ]
      }
      ]
      

      and the alert spec is

      boltPolicyIdsMap: {
      alertBolt9: [
      "NameNodeWithOneNoResponse",
      "NameNodeHAHasNoResponse",
      "CallQueueLengthExceeds30Times",
      "CallQueueLengthExceeds2Times"
      ],
      alertBolt0: [
      "NameNodeWithOneNoResponse",
      "NameNodeHAHasNoResponse",
      "CallQueueLengthExceeds30Times",
      "CallQueueLengthExceeds2Times"
      ],
      alertBolt1: [
      "NameNodeWithOneNoResponse",
      "NameNodeHAHasNoResponse",
      "CallQueueLengthExceeds30Times",
      "CallQueueLengthExceeds2Times"
      ],
      alertBolt2: [
      "NameNodeWithOneNoResponse",
      "NameNodeHAHasNoResponse",
      "CallQueueLengthExceeds30Times",
      "CallQueueLengthExceeds2Times"
      ],
      alertBolt3: [
      "NameNodeWithOneNoResponse",
      "NameNodeHAHasNoResponse",
      "CallQueueLengthExceeds30Times",
      "CallQueueLengthExceeds2Times"
      ]
      }
      

      3. produce messages into kafka topic 'hadoop_jmx_metrics_sandbox' and trigger NameNodeWithOneNoResponse.

      {"timestamp": 1490250963445, "metric": "hadoop.namenode.hastate.failed.count", "component": "namenode", "site": "artemislvs", "value": 0.0, "host": "localhost"}
      

      Then one message is sent three times.

      Attachments

        Issue Links

          Activity

            People

              qingwzhao Qingwen Zhao
              qingwzhao Qingwen Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: