Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7707

Add StreamExpression Support to RollupStream

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: SolrJ
    • Labels:
      None

      Description

      This ticket is to add Stream Expression support to the RollupStream as discussed in SOLR-7560.

      Proposed expression syntax for the RollupStream (copied from that ticket)

      rollup(
        someStream(....),
        over="fieldA, fieldB, fieldC",
        min(fieldA),
        max(fieldA),
        min(fieldB),
        mean(fieldD),
        sum(fieldC)
      )
      

      This requires making the *Metric types Expressible but I think that ends up as a good thing. Would make it real easy to support other options on metrics like excluding outliers, for example find the sum of values within 3 standard deviations from the mean could be

      sum(fieldC, limit=standardDev(3))
      

      (note, how that particular calculation could be implemented is left as an exercise for the reader, I'm just using it as an example of adding additional options on a relatively simple metric).
      Another option example is what to do with null values. For example, in some cases a null should not impact a mean but in others it should. You could express those as

      mean(fieldA, replace(null, 0))  // replace null values with 0 thus leading to an impact on the mean
      mean(fieldA, includeNull="true") // nulls are counted in the denominator but nothing added to numerator
      mean(fieldA, includeNull="false") // nulls neither counted in denominator nor added to numerator
      mean(fieldA, replace(null, fieldB), includeNull="true") // if fieldA is null replace it with fieldB, include null fieldB in mean
      

      so on and so forth.

      1. SOLR-7707.patch
        65 kB
        Dennis Gove
      2. SOLR-7707.patch
        75 kB
        Dennis Gove
      3. SOLR-7707.patch
        75 kB
        Dennis Gove
      4. SOLR-7707.patch
        74 kB
        Joel Bernstein

        Activity

        Hide
        dpgove Dennis Gove added a comment -

        Adds expression support to RollupStream.

        Note: I have added a ParallelRollupStream test but I cannot get it to pass. I feel as though I've forgotten a required change to make it work with ParallelStream.

        Show
        dpgove Dennis Gove added a comment - Adds expression support to RollupStream. Note: I have added a ParallelRollupStream test but I cannot get it to pass. I feel as though I've forgotten a required change to make it work with ParallelStream.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Ok, thanks Dennis. I'll take a look at the ParallelStream test.

        Show
        joel.bernstein Joel Bernstein added a comment - Ok, thanks Dennis. I'll take a look at the ParallelStream test.
        Hide
        dpgove Dennis Gove added a comment -

        I found the problem.

        There is a test class called CountStream. In some of the test files (particularly solr/solrj/src/test-files/solrj/solr/collection1/conf/solrconfig-streaming.xml) the function name "count" was mapped to that Stream. However, now with a count metric I was also mapping the count function name to CountMetric.

        For the moment I have corrected this by renaming CountStream to RecordCountStream and commented out the mapping in the solrconfig-streaming.xml file. I chose to change this one because it is a class in the test suite and not, apparently, used outside of testing.

        However, this brings up an interesting question. Should we allow conflicting names across streams and metrics. Right now both the mapping for function name to Stream or Metric is stored in the same Map and as such we we are not allowing the conflict of names - ie, both a stream and metric cannot share the same function name. However, should we allow that?

        I believe the answer, for clarity, is no. If you assign the string "count" to CountMetric then you cannot also assign it to CountStream. This will allow users to know what "count(....)" means without having to know the context. For example, allowing "count" to map to both could result in confusion in the following

        rollup(
          count(search(....)),
          min(fieldA),
          count(fieldB)
        )
        
        Show
        dpgove Dennis Gove added a comment - I found the problem. There is a test class called CountStream. In some of the test files (particularly solr/solrj/src/test-files/solrj/solr/collection1/conf/solrconfig-streaming.xml) the function name "count" was mapped to that Stream. However, now with a count metric I was also mapping the count function name to CountMetric. For the moment I have corrected this by renaming CountStream to RecordCountStream and commented out the mapping in the solrconfig-streaming.xml file. I chose to change this one because it is a class in the test suite and not, apparently, used outside of testing. However, this brings up an interesting question. Should we allow conflicting names across streams and metrics. Right now both the mapping for function name to Stream or Metric is stored in the same Map and as such we we are not allowing the conflict of names - ie, both a stream and metric cannot share the same function name. However, should we allow that? I believe the answer, for clarity, is no. If you assign the string "count" to CountMetric then you cannot also assign it to CountStream. This will allow users to know what "count(....)" means without having to know the context. For example, allowing "count" to map to both could result in confusion in the following rollup( count(search(....)), min(fieldA), count(fieldB) )
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Looks like your patch is a commit or two behind svn trunk. Take a look at:

        https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SQLHandler.java

        You'll see it already has the MultipleFieldComparator, StreamComparator incorporated. Wondering if the git repo is falling to far behind.

        Show
        joel.bernstein Joel Bernstein added a comment - Looks like your patch is a commit or two behind svn trunk. Take a look at: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SQLHandler.java You'll see it already has the MultipleFieldComparator, StreamComparator incorporated. Wondering if the git repo is falling to far behind.
        Hide
        dpgove Dennis Gove added a comment - - edited

        Looks like I cut my branch from trunk before those changes were committed. I'll go through some rebasing tomorrow and post up a new patch. Sorry about that.

        Show
        dpgove Dennis Gove added a comment - - edited Looks like I cut my branch from trunk before those changes were committed. I'll go through some rebasing tomorrow and post up a new patch. Sorry about that.
        Hide
        joel.bernstein Joel Bernstein added a comment - - edited

        No problem, I'm still wrapping up SOLR-7441.

        Show
        joel.bernstein Joel Bernstein added a comment - - edited No problem, I'm still wrapping up SOLR-7441 .
        Hide
        dpgove Dennis Gove added a comment -

        New correctly based patch attached.

        Show
        dpgove Dennis Gove added a comment - New correctly based patch attached.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Patch looks great!

        I switched over the SQLHandler to use Streaming Expressions as the parallel transport format and it's extremely compact compared to string encoded object serialization.

        Show
        joel.bernstein Joel Bernstein added a comment - Patch looks great! I switched over the SQLHandler to use Streaming Expressions as the parallel transport format and it's extremely compact compared to string encoded object serialization.
        Hide
        joel.bernstein Joel Bernstein added a comment - - edited

        Patch with all tests passing and pre-commit passing

        Show
        joel.bernstein Joel Bernstein added a comment - - edited Patch with all tests passing and pre-commit passing
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1689168 from Joel Bernstein in branch 'dev/trunk'
        [ https://svn.apache.org/r1689168 ]

        SOLR-7707: Add StreamExpression Support to RollupStream

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1689168 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1689168 ] SOLR-7707 : Add StreamExpression Support to RollupStream
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1694910 from Joel Bernstein in branch 'dev/trunk'
        [ https://svn.apache.org/r1694910 ]

        SOLR-7707: Updated CHANGES.txt

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1694910 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1694910 ] SOLR-7707 : Updated CHANGES.txt
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Release with Solr 6

        Show
        joel.bernstein Joel Bernstein added a comment - Release with Solr 6

          People

          • Assignee:
            Unassigned
            Reporter:
            dpgove Dennis Gove
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development