Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8185

Add operations support to streaming metrics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • SolrJ
    • None

    Description

      Adds support for operations on stream metrics.

      With this feature one can modify tuple values before applying to the computed metric. There are a lot of use-cases I can see with this - I'll describe one here.

      Imagine you have a RollupStream which is computing the average over some field but you cannot be sure that all documents have a value for that field, ie the value is null. When the value is null you want to treat it as a 0. With this feature you can accomplish that like this

      rollup(
        search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
        over=\"a_s\",
        avg(a_i, replace(null, withValue=0)),
        count(*),
      )
      

      The operations are applied to the tuple for each metric in the stream which means you perform different operations on different metrics without being impacted by operations on other metrics.

      Adding to our previous example, imagine you want to also get the min of a field but do not consider null values.

      rollup(
        search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
        over=\"a_s\",
        avg(a_i, replace(null, withValue=0)),
        min(a_i),
        count(*),
      )
      

      Also, the tuple is not modified for streams that might wrap this one. Ie, the only thing that sees the applied operation is that particular metric. If you want to apply operations for wrapping streams you can still achieve that with the SelectStream (SOLR-7669).

      One feature I'm investigating but this patch DOES NOT add is the ability to assign names to the resulting metric value. For example, to allow for something like this

      rollup(
        search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
        over=\"a_s\",
        avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"),
        avg(a_i),
        count(*, as="totalCount"),
      )
      

      Right now that isn't possible because the identifier for each metric would be the same "avg_a_i" and as such both couldn't be returned. It's relatively easy to add but I have to investigate its impact on the SQL and FacetStream areas.

      Depends on SOLR-7669 (SelectStream)

      Attachments

        1. SOLR-8185.patch
          60 kB
          Dennis Gove

        Issue Links

          Activity

            People

              dpgove Dennis Gove
              dpgove Dennis Gove
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: