Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8185

Add operations support to streaming metrics

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: SolrJ
    • Labels:
      None

      Description

      Adds support for operations on stream metrics.

      With this feature one can modify tuple values before applying to the computed metric. There are a lot of use-cases I can see with this - I'll describe one here.

      Imagine you have a RollupStream which is computing the average over some field but you cannot be sure that all documents have a value for that field, ie the value is null. When the value is null you want to treat it as a 0. With this feature you can accomplish that like this

      rollup(
        search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
        over=\"a_s\",
        avg(a_i, replace(null, withValue=0)),
        count(*),
      )
      

      The operations are applied to the tuple for each metric in the stream which means you perform different operations on different metrics without being impacted by operations on other metrics.

      Adding to our previous example, imagine you want to also get the min of a field but do not consider null values.

      rollup(
        search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
        over=\"a_s\",
        avg(a_i, replace(null, withValue=0)),
        min(a_i),
        count(*),
      )
      

      Also, the tuple is not modified for streams that might wrap this one. Ie, the only thing that sees the applied operation is that particular metric. If you want to apply operations for wrapping streams you can still achieve that with the SelectStream (SOLR-7669).

      One feature I'm investigating but this patch DOES NOT add is the ability to assign names to the resulting metric value. For example, to allow for something like this

      rollup(
        search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
        over=\"a_s\",
        avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"),
        avg(a_i),
        count(*, as="totalCount"),
      )
      

      Right now that isn't possible because the identifier for each metric would be the same "avg_a_i" and as such both couldn't be returned. It's relatively easy to add but I have to investigate its impact on the SQL and FacetStream areas.

      Depends on SOLR-7669 (SelectStream)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dpgove Dennis Gove
                Reporter:
                dpgove Dennis Gove
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: