Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8458

Add Streaming Expressions tests for parameter substitution

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Implemented
    • Affects Version/s: None
    • Fix Version/s: 6.1, 7.0
    • Component/s: None
    • Labels:
      None

      Description

      This ticket is to add Streaming Expression tests that exercise the existing macro expansion feature described here: http://yonik.com/solr-query-parameter-substitution/

      Sample syntax below:

      http://localhost:8983/col/stream?expr=merge(${left}, ${right}, ...)&left=search(...)&right=search(...)
      
      1. SOLR-8458.patch
        3 kB
        Kevin Risden
      2. SOLR-8458.patch
        4 kB
        Cao Manh Dat
      3. SOLR-8458.patch
        7 kB
        Cao Manh Dat
      4. SOLR-8458.patch
        34 kB
        Cao Manh Dat
      5. SOLR-8458.patch
        4 kB
        Cao Manh Dat

        Issue Links

          Activity

          Hide
          caomanhdat Cao Manh Dat added a comment -

          Here are patch for this issue. I avoid using nested vars (it may lead to recursive call).

          Show
          caomanhdat Cao Manh Dat added a comment - Here are patch for this issue. I avoid using nested vars (it may lead to recursive call).
          Hide
          dpgove Dennis Gove added a comment -

          What if we were to make substitution parameters first class citizens similar to named parameters? During the parsing in ExpressionParser we could create instances of StreamExpressionSubstitutionParameters which exist as first class citizens of an StreamExpression object. This would allow us to send (in the example in the description) "expr", "left", and "right" through the ExpressionParser. Then, a simple method can be added to the StreamFactory which accepts a main expression and a map of names => expressions. It could then iterate over parameters of the main expression doing replacements until there are no more instances of StreamExpressionSubstitutionParameter in the main expression. Some checks for infinite loops would have to be added but those are relatively simple.

          This approach would allow the logic to exist outside of the StreamHandler which I think would be beneficial for the SQL Handler.

          It might also allow for some type of prepared statements with "pre-compiled" pieces (similar to what one might see in a DBMS). For example, this might be beneficial in a situation where some very expensive part of the expression is static which you want to perform different rollups or joins or whatever with. An optimizer could hang onto the static results in a RepeatableStream (doesn't exist yet) and substitute that into some other expression.

          Show
          dpgove Dennis Gove added a comment - What if we were to make substitution parameters first class citizens similar to named parameters? During the parsing in ExpressionParser we could create instances of StreamExpressionSubstitutionParameters which exist as first class citizens of an StreamExpression object. This would allow us to send (in the example in the description) "expr", "left", and "right" through the ExpressionParser. Then, a simple method can be added to the StreamFactory which accepts a main expression and a map of names => expressions. It could then iterate over parameters of the main expression doing replacements until there are no more instances of StreamExpressionSubstitutionParameter in the main expression. Some checks for infinite loops would have to be added but those are relatively simple. This approach would allow the logic to exist outside of the StreamHandler which I think would be beneficial for the SQL Handler. It might also allow for some type of prepared statements with "pre-compiled" pieces (similar to what one might see in a DBMS). For example, this might be beneficial in a situation where some very expensive part of the expression is static which you want to perform different rollups or joins or whatever with. An optimizer could hang onto the static results in a RepeatableStream (doesn't exist yet) and substitute that into some other expression.
          Hide
          caomanhdat Cao Manh Dat added a comment -

          It quite a good idea. But I also confuse about the syntax of substitution in Streaming Expressions

          Show
          caomanhdat Cao Manh Dat added a comment - It quite a good idea. But I also confuse about the syntax of substitution in Streaming Expressions
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I'm fine either as a pre-processing step or including the substitution as part of the parsing.

          If we stick with the pre-processing step I think we'll need to process the nested vars. One approach to this would be to continue looping over the expression string and call the substitution method on each loop until there are no more $vars.

          Show
          joel.bernstein Joel Bernstein added a comment - I'm fine either as a pre-processing step or including the substitution as part of the parsing. If we stick with the pre-processing step I think we'll need to process the nested vars. One approach to this would be to continue looping over the expression string and call the substitution method on each loop until there are no more $vars.
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Dennis Gove, you have good ideas here though. +1 if you want to take this on.

          Show
          joel.bernstein Joel Bernstein added a comment - Dennis Gove , you have good ideas here though. +1 if you want to take this on.
          Hide
          caomanhdat Cao Manh Dat added a comment - - edited

          That a good idea.
          About substitution as part of the parsing. Should the syntax look like?
          We cant use & key words here
          merge($left, $right, ...)&left=search(...)&right=search(...)

          Show
          caomanhdat Cao Manh Dat added a comment - - edited That a good idea. About substitution as part of the parsing. Should the syntax look like? We cant use & key words here merge($left, $right, ...)&left=search(...)&right=search(...)
          Hide
          dpgove Dennis Gove added a comment - - edited

          As I see it there are 2 pieces here, both related but separate.

          First, adding support for parameter substitution in an expression. This would be handled with the changes I discussed above to StreamFactory, StreamParser, and the addition of a new type StreamExpressionSubstitutionParameter. Note that this doesn't necessarily care how the expressions come in.

          And second, adding support for parameter substitution in StreamHandler and in an http request. I like the syntax Joel uses in the description. What this would mean is that StreamHandler would see http params like "expr", "left" and "right", would know that these are expressions (can call into StreamFactory to check if something is a valid expression), and would pass them off independently to be parsed and then together to be pieced together.

          This approach modularizes the implementation such that how an expression with substitution comes in via http is independent to how it is handled within the Streaming API.

          For example, the following comes into StreamHandler

          http://localhost:8983/col/stream?expr=merge($left, $right, ...)&baz=jaz&left=search(...)&right=search(...)&foo=bar
          

          The StreamHandler will see five parameters, expr, baz, left, right, and foo. It would then determine that expr, left, and right are valid expressions and pass them off to be parsed into three expression objects. It would then pass all three into the factory to be combined into a single Stream object. The factory would then iterate (recursively?) until there aren't any more instances of a StreamExpressionSubstitutionParameter at any level (considering the possibility of infinite loops, of course). At this point it'd then just be passed off to create a Stream object as any other expression would be.

          Another possibility would be to parse out the substitution expressions and then register them in the factory for use during Stream object creation. This would negate the need to do that pre-processing of the N substitution expression and would give a place to register "pre-compiled" expressions. I'm not a huge fan of this approach as it would add more state to the factory and I'm not a huge fan of the state it already contains.

          I'm happy to take this on unless, Cao Manh Dat, you want to continue your work on it.

          Show
          dpgove Dennis Gove added a comment - - edited As I see it there are 2 pieces here, both related but separate. First, adding support for parameter substitution in an expression. This would be handled with the changes I discussed above to StreamFactory, StreamParser, and the addition of a new type StreamExpressionSubstitutionParameter. Note that this doesn't necessarily care how the expressions come in. And second, adding support for parameter substitution in StreamHandler and in an http request. I like the syntax Joel uses in the description. What this would mean is that StreamHandler would see http params like "expr", "left" and "right", would know that these are expressions (can call into StreamFactory to check if something is a valid expression), and would pass them off independently to be parsed and then together to be pieced together. This approach modularizes the implementation such that how an expression with substitution comes in via http is independent to how it is handled within the Streaming API. For example, the following comes into StreamHandler http: //localhost:8983/col/stream?expr=merge($left, $right, ...)&baz=jaz&left=search(...)&right=search(...)&foo=bar The StreamHandler will see five parameters, expr, baz, left, right, and foo. It would then determine that expr, left, and right are valid expressions and pass them off to be parsed into three expression objects. It would then pass all three into the factory to be combined into a single Stream object. The factory would then iterate (recursively?) until there aren't any more instances of a StreamExpressionSubstitutionParameter at any level (considering the possibility of infinite loops, of course). At this point it'd then just be passed off to create a Stream object as any other expression would be. Another possibility would be to parse out the substitution expressions and then register them in the factory for use during Stream object creation. This would negate the need to do that pre-processing of the N substitution expression and would give a place to register "pre-compiled" expressions. I'm not a huge fan of this approach as it would add more state to the factory and I'm not a huge fan of the state it already contains. I'm happy to take this on unless, Cao Manh Dat , you want to continue your work on it.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Regarding pre-processing (i.e. ignorant of syntax), Solr already has parameter substitution across entire query requests:
          http://yonik.com/solr-query-parameter-substitution/

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Regarding pre-processing (i.e. ignorant of syntax), Solr already has parameter substitution across entire query requests: http://yonik.com/solr-query-parameter-substitution/
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Thanks for bringing this up.

          In reading the blog post it sounds like we already have param substitution for streaming expression and just didn't know it. I didn't realize this functionality was so broad.

          So, any parameter regardless of whether it's part of a local param will automatically have parameter substitutions applied?

          Show
          joel.bernstein Joel Bernstein added a comment - Thanks for bringing this up. In reading the blog post it sounds like we already have param substitution for streaming expression and just didn't know it. I didn't realize this functionality was so broad. So, any parameter regardless of whether it's part of a local param will automatically have parameter substitutions applied?
          Hide
          joel.bernstein Joel Bernstein added a comment -

          I like the syntax as well:

          http://localhost:8983/col/stream?expr=merge(${left}, ${right}, ...)&baz=jaz&left=search(...)&right=search(...)&foo=bar
          

          If this is already working then we can change the ticket to Add Streaming Expressions tests for param substitution.

          Dennis Gove, we can always decide to work on the pre-compiled expressions at a later time. But I think that will take quite a bit of thought.

          Show
          joel.bernstein Joel Bernstein added a comment - I like the syntax as well: http: //localhost:8983/col/stream?expr=merge(${left}, ${right}, ...)&baz=jaz&left=search(...)&right=search(...)&foo=bar If this is already working then we can change the ticket to Add Streaming Expressions tests for param substitution . Dennis Gove , we can always decide to work on the pre-compiled expressions at a later time. But I think that will take quite a bit of thought.
          Hide
          joel.bernstein Joel Bernstein added a comment - - edited

          Cao Manh Dat, would you like to work on a few test cases based on Yonik's blog? http://yonik.com/solr-query-parameter-substitution/

          Show
          joel.bernstein Joel Bernstein added a comment - - edited Cao Manh Dat , would you like to work on a few test cases based on Yonik's blog? http://yonik.com/solr-query-parameter-substitution/
          Hide
          dpgove Dennis Gove added a comment -

          I agree. There's no reason to reinvent and I'm always a fan of keeping things consistent. If preprocessing substitution is already implemented for all incoming requests then we should absolutely make use of it.

          Show
          dpgove Dennis Gove added a comment - I agree. There's no reason to reinvent and I'm always a fan of keeping things consistent. If preprocessing substitution is already implemented for all incoming requests then we should absolutely make use of it.
          Hide
          dpgove Dennis Gove added a comment -

          This is great news. I'm all for continuing to make use of this feature. Thanks!

          Show
          dpgove Dennis Gove added a comment - This is great news. I'm all for continuing to make use of this feature. Thanks!
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Yeah, looks like it is already working:

          Putting this into my browser:

          http://localhost:8983/solr/techproducts/stream?expr=${foo}&foo=hello
          

          I get:

          {"EXCEPTION":"'hello' is not a proper expression clause","EOF":true}]}}
          

          If you try it with a GET in curl, just turn off curl's globbing:

          curl  -g 'http://localhost:8983/solr/techproducts/stream?expr=${foo}&foo=hello'
          
          Show
          yseeley@gmail.com Yonik Seeley added a comment - Yeah, looks like it is already working: Putting this into my browser: http: //localhost:8983/solr/techproducts/stream?expr=${foo}&foo=hello I get: { "EXCEPTION" : "'hello' is not a proper expression clause" , "EOF" : true }]}} If you try it with a GET in curl, just turn off curl's globbing: curl -g 'http: //localhost:8983/solr/techproducts/stream?expr=${foo}&foo=hello'
          Hide
          joel.bernstein Joel Bernstein added a comment -

          Ok, great.

          I'll change the ticket to Add Streaming Expression tests with param substitution.

          Show
          joel.bernstein Joel Bernstein added a comment - Ok, great. I'll change the ticket to Add Streaming Expression tests with param substitution.
          Hide
          caomanhdat Cao Manh Dat added a comment - - edited

          Yes, I would like to help.

          Currently, Streaming Expressions have quite compact syntax
          funcName(param1|func1,param2|func2,..)
          So what is the best way to add substitution parameter in Streaming Expression?
          Should it be?
          merge($left, $right,...) with left=search(...), right=search(...)

          Show
          caomanhdat Cao Manh Dat added a comment - - edited Yes, I would like to help. Currently, Streaming Expressions have quite compact syntax funcName(param1|func1,param2|func2,..) So what is the best way to add substitution parameter in Streaming Expression? Should it be? merge($left, $right,...) with left=search(...), right=search(...)
          Hide
          caomanhdat Cao Manh Dat added a comment -

          Sure, I will work on this.

          Show
          caomanhdat Cao Manh Dat added a comment - Sure, I will work on this.
          Hide
          dpgove Dennis Gove added a comment -

          It appears from the thread below that substitution is already supported (see Yonik's comment below). At this point the action item would be to add streaming expression tests for parameter substitution.

          Show
          dpgove Dennis Gove added a comment - It appears from the thread below that substitution is already supported (see Yonik's comment below). At this point the action item would be to add streaming expression tests for parameter substitution.
          Hide
          caomanhdat Cao Manh Dat added a comment -

          Added random substitution for constructing tuple stream in StreamExpressionTest.

          Show
          caomanhdat Cao Manh Dat added a comment - Added random substitution for constructing tuple stream in StreamExpressionTest.
          Hide
          dpgove Dennis Gove added a comment -

          Cao,

          What's the purpose of ClientTupleStream? It appears it's only used in the tests and doesn't add any value as a Stream object.

          I'd rather not replace all existing stream creations with a randomized choice between doing substitution and not. I think it'd be better to have explicit tests which exercise substitution. I don't think it'd be necessary to test that substitution on each and every stream class because the implementation is outside of the stream classes. Also, it appears that the randomization of the choice is non-repeatable. Ie, if I rerun the tests with a -Dtests.seed value would the random choices be the same?

          It appears that the substitution is just picking some substring in the expression and marking it as being a parameter. I think this should test substituting entire expression clauses, like

          http://localhost:8983/col/stream?expr=merge($left, $right, ...)&left=search(...)&right=search(...)
          

          where left and right are entire clauses. The tests you've provided appear to do something like this

          http://localhost:8983/col/stream?expr=merge(sear$left, se$right..), ...)&left=ch(...)&right=arch(.
          

          which I don't think makes much sense. Technically the substitution should handle that but I think the codification should be that one would want to substitute entire expressions.

          Show
          dpgove Dennis Gove added a comment - Cao, What's the purpose of ClientTupleStream? It appears it's only used in the tests and doesn't add any value as a Stream object. I'd rather not replace all existing stream creations with a randomized choice between doing substitution and not. I think it'd be better to have explicit tests which exercise substitution. I don't think it'd be necessary to test that substitution on each and every stream class because the implementation is outside of the stream classes. Also, it appears that the randomization of the choice is non-repeatable. Ie, if I rerun the tests with a -Dtests.seed value would the random choices be the same? It appears that the substitution is just picking some substring in the expression and marking it as being a parameter. I think this should test substituting entire expression clauses, like http: //localhost:8983/col/stream?expr=merge($left, $right, ...)&left=search(...)&right=search(...) where left and right are entire clauses. The tests you've provided appear to do something like this http: //localhost:8983/col/stream?expr=merge(sear$left, se$right..), ...)&left=ch(...)&right=arch(. which I don't think makes much sense. Technically the substitution should handle that but I think the codification should be that one would want to substitute entire expressions.
          Hide
          caomanhdat Cao Manh Dat added a comment - - edited

          What's the purpose of ClientTupleStream? It appears it's only used in the tests and doesn't add any value as a Stream object.

          I create the class to simplify the code in test class. Currently, we dont have any TupleStream which support passing SolrClient and SolrParams. in SolrStream, we pass in baseUrl and it always create HttpSolrClient (not CloudClient). In CloudSolrStream, we pass in ZKAdress and it always look up for fl & sort params...

          I don't think it'd be necessary to test that substitution on each and every stream class because the implementation is outside of the stream classes.

          I good point. I forgot that query parameter substitution already been tested in other class. We just wanna to show the guide here. I will write a testSubstituteStream method which code derive from testMergeStream()

          Show
          caomanhdat Cao Manh Dat added a comment - - edited What's the purpose of ClientTupleStream? It appears it's only used in the tests and doesn't add any value as a Stream object. I create the class to simplify the code in test class. Currently, we dont have any TupleStream which support passing SolrClient and SolrParams. in SolrStream, we pass in baseUrl and it always create HttpSolrClient (not CloudClient). In CloudSolrStream, we pass in ZKAdress and it always look up for fl & sort params... I don't think it'd be necessary to test that substitution on each and every stream class because the implementation is outside of the stream classes. I good point. I forgot that query parameter substitution already been tested in other class. We just wanna to show the guide here. I will write a testSubstituteStream method which code derive from testMergeStream()
          Hide
          caomanhdat Cao Manh Dat added a comment -

          New patch based on the suggestion from Dennis Gove

          Show
          caomanhdat Cao Manh Dat added a comment - New patch based on the suggestion from Dennis Gove
          Hide
          caomanhdat Cao Manh Dat added a comment -

          Dennis Gove Sorry for misunderstanding some aspects of streaming. It's quite compact and clean now.

          Show
          caomanhdat Cao Manh Dat added a comment - Dennis Gove Sorry for misunderstanding some aspects of streaming. It's quite compact and clean now.
          Hide
          risdenk Kevin Risden added a comment -

          Updated patch to master based on SOLR-9065. Running tests and precommit now.

          Show
          risdenk Kevin Risden added a comment - Updated patch to master based on SOLR-9065 . Running tests and precommit now.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 26027259a5fff5f8e7df1a927708f048ba92002f in lucene-solr's branch refs/heads/master from Kevin Risden
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2602725 ]

          SOLR-8458: Add Streaming Expressions tests for parameter substitution

          Show
          jira-bot ASF subversion and git services added a comment - Commit 26027259a5fff5f8e7df1a927708f048ba92002f in lucene-solr's branch refs/heads/master from Kevin Risden [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2602725 ] SOLR-8458 : Add Streaming Expressions tests for parameter substitution
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4d15b9fa080cb95465338ec773a972559de7ec3d in lucene-solr's branch refs/heads/branch_6x from Kevin Risden
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4d15b9f ]

          SOLR-8458: Add Streaming Expressions tests for parameter substitution

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4d15b9fa080cb95465338ec773a972559de7ec3d in lucene-solr's branch refs/heads/branch_6x from Kevin Risden [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4d15b9f ] SOLR-8458 : Add Streaming Expressions tests for parameter substitution
          Hide
          hossman Hoss Man added a comment -

          Manually correcting fixVersion per Step #S5 of LUCENE-7271

          Show
          hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271

            People

            • Assignee:
              risdenk Kevin Risden
              Reporter:
              joel.bernstein Joel Bernstein
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development