Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-4713

Datadog Metrics Alignment

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4.0
    • Fix Version/s: None
    • Component/s: Extensions
    • Labels:

      Description

      Metrics that are being fed into Datadog from Nifi do not seem to align to the Nifi model. Therefore, I am proposing the following.

      1. Change the metric names to work better with Datadog
      2. Become more reliant on tagging
      3. Allow custom tagging

      Currently, metrics are being sent to Datadog in the following format:

      <metricsPrefix>.<processorName/flow>.<metricName>

      However, Datadog is more of a reuse a metric name and filter via tagging system. So in Datadog, something with a metric name of <metricsPrefix>.<metricName> with a tag of <processorName> works better than one unique metric per processor (in an event where there is no processorName, exclude the tag instead of adding 'flow').

      Consider the way Datadog does Kafka. The metric kafka.consumer_lag represents the current lag of a topic (tag) for a given consumer_group (tag) over all partitions (tag).

      For the same moment in time:
      kafka.consumer_lag = 5 <topic:a, consumer_group:nifi, partition:0>
      kafka.consumer_lag = 7 <topic:a, consumer_group:nifi, partition:1>
      kafka.consumer_lag = 22 <topic:a, consumer_group:python, partition:0>
      kafka.consumer_lag = 19 <topic:a, consumer_group:python, partition:1>
      kafka.consumer_lag = 2 <topic:b, consumer_group:nifi, partition:0>

      If I wanted to know what the current lag was for a given consumer_group on all topics, I would include those tags and then sum on the remaining records (which would be the across the partitions).

      For the same moment in time:
      kafka.consumer_lag = 12 for topic:a and consumer_group:nifi
      kafka.consumer_lag = 2 for topic:b and consumer_group:nifi

      In a Nifi sense, this could allow you to (for example) have a tag that noted this was an aws-sqs pull and aggregate the average number of records being pulled across the entire system instead of on a single process.

      Additionally, there is room for custom tagging as well. For example: I want to be able to aggregate across all Nifi clusters I control. Setting the prefix unique for each cluster breaks this aggregation and might not allow me to filter properly later if I do not set a prefix. But, if custom tagging was allowed, I could set a tag for cluster_name:nifi-1 and then you could have all metrics aggregated but be able to filter down to that specific cluster for other operations. In my opinion, the easiest way to implement this would be to take all non-required attributes from the Datadog controller and use them as the custom tags (these attributes should be considered final/static when loaded). The attributes are already in Key=Value format, so it should be easy enough to switch them over to Key:Value formatting for tagging (once the required attributes are removed).

      (Most if not all work for this is centered on org.apache.nifi.reporting.datadog.DataDogReportingTask)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              robatts Robert Batts
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: