Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8556

Add ConcatOperation to be used with the SelectStream

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Now that we have the UpdateStream it would be nice to support the use case of sending rolled up aggregates for storage in another SolrCloud collection. To support this we'll need to create id's for the aggregate records.

      The ConcatOperation would allows us to concatenate the bucket values into a unique id. For example:

      update(
                  select( 
                               rollup(search(q="*:*, fl="a,b,c", ...)), 
                               concat(fields="a,b,c", delim="_",  as="id")))
      
      1. SOLR-8556.patch
        4 kB
        Joel Bernstein
      2. SOLR-8556.patch
        4 kB
        Dennis Gove
      3. SOLR-8556.patch
        17 kB
        Dennis Gove
      4. SOLR-8556.patch
        18 kB
        Dennis Gove

        Issue Links

          Activity

          Hide
          joel.bernstein Joel Bernstein added a comment -

          First pass at the implementation tests are needed

          Show
          joel.bernstein Joel Bernstein added a comment - First pass at the implementation tests are needed
          Hide
          joel.bernstein Joel Bernstein added a comment -

          This ticket is important for supporting background aggregations with the DaemonStream. Example:

          daemon(parallel(update(select(concat(fields="month,day,year", delim="-", as="id"), rollup(search())))))
          

          The select above would concatenate buckets into a unique id so that each time the parallel rollup ran the new aggregate values would update old values in the Solr index being updated.

          Show
          joel.bernstein Joel Bernstein added a comment - This ticket is important for supporting background aggregations with the DaemonStream. Example: daemon(parallel(update(select(concat(fields= "month,day,year" , delim= "-" , as= "id" ), rollup(search()))))) The select above would concatenate buckets into a unique id so that each time the parallel rollup ran the new aggregate values would update old values in the Solr index being updated.
          Hide
          dpgove Dennis Gove added a comment - - edited
          expression.addParameter(new StreamExpressionNamedParameter("fields",fieldsStr));
          

          If the ConcatOperation was created using the non-expression constructor then fieldsStr will be unset and as such this won't produce the expected result. Instead, I'd iterate over the fields array and create a comma-separated list. This would allow the removal of the global fieldsStr.

          Show
          dpgove Dennis Gove added a comment - - edited expression.addParameter( new StreamExpressionNamedParameter( "fields" ,fieldsStr)); If the ConcatOperation was created using the non-expression constructor then fieldsStr will be unset and as such this won't produce the expected result. Instead, I'd iterate over the fields array and create a comma-separated list. This would allow the removal of the global fieldsStr.
          Hide
          dpgove Dennis Gove added a comment -
          buf.append(field);
          

          This concatenates the fields together instead of the values of the fields together.

          Show
          dpgove Dennis Gove added a comment - buf.append(field); This concatenates the fields together instead of the values of the fields together.
          Hide
          dpgove Dennis Gove added a comment -

          I'm going through and creating tests so I'll correct these issues as I go.

          Show
          dpgove Dennis Gove added a comment - I'm going through and creating tests so I'll correct these issues as I go.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Great thanks!

          Show
          joel.bernstein Joel Bernstein added a comment - Great thanks!
          Hide
          dpgove Dennis Gove added a comment -

          Adds ConcatOperation specific tests. Corrects the issues mentioned above. Would still like to add a test showing the usage of this inside a SelectStream. For example, there is a difference between these two clauses

          select(a,b,c, search(....), replace(a,null,withValue=0f), concat(fields="a,b", as="ab", delim="-"))
          
          select(a,b,c, search(....), concat(fields="a,b", as="ab", delim="-"), replace(a,null,withValue=0f))
          

          In the first one a null value in field a will first be replaced with 0 and then concatenated with b whereas in the second one a and b will be concatenated first and then a null value in a would be replaced with 0. Ie, the order of operations matters.

          Also note, I added a feature which, for null values, will concatenate the string "null". If one wants to replace null with a different value then one can use the replace operation in conjunction with the concat operation.

          Show
          dpgove Dennis Gove added a comment - Adds ConcatOperation specific tests. Corrects the issues mentioned above. Would still like to add a test showing the usage of this inside a SelectStream. For example, there is a difference between these two clauses select(a,b,c, search(....), replace(a, null ,withValue=0f), concat(fields= "a,b" , as= "ab" , delim= "-" )) select(a,b,c, search(....), concat(fields= "a,b" , as= "ab" , delim= "-" ), replace(a, null ,withValue=0f)) In the first one a null value in field a will first be replaced with 0 and then concatenated with b whereas in the second one a and b will be concatenated first and then a null value in a would be replaced with 0. Ie, the order of operations matters. Also note, I added a feature which, for null values, will concatenate the string "null". If one wants to replace null with a different value then one can use the replace operation in conjunction with the concat operation.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          The ConcatOperation will be pretty easy to wire into the SQLHandler once it's committed. I'll create a ticket for that.

          select a,b, concat('a,b', '-') as c from tablex
          
          Show
          joel.bernstein Joel Bernstein added a comment - The ConcatOperation will be pretty easy to wire into the SQLHandler once it's committed. I'll create a ticket for that. select a,b, concat('a,b', '-') as c from tablex
          Hide
          dpgove Dennis Gove added a comment -

          Adds additional tests. I think this is good to go.

          Show
          dpgove Dennis Gove added a comment - Adds additional tests. I think this is good to go.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Looks great.

          +1 to commit.

          Show
          joel.bernstein Joel Bernstein added a comment - Looks great. +1 to commit.
          Hide
          dpgove Dennis Gove added a comment -

          Added "concat" to StreamHandler so it is a default operation.

          Show
          dpgove Dennis Gove added a comment - Added "concat" to StreamHandler so it is a default operation.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1725769 from dpgove@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1725769 ]

          SOLR-8556: Add ConcatOperation to be used with the SelectStream

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1725769 from dpgove@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1725769 ] SOLR-8556 : Add ConcatOperation to be used with the SelectStream

            People

            • Assignee:
              dpgove Dennis Gove
              Reporter:
              joel.bernstein Joel Bernstein
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development