Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: clients - java
    • Labels:
      None

      Description

      It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform.

      search(collection1, q=*:*, fl="id,fieldA,fieldB", sort="fieldA asc")
      

      With this syntax in mind, one can now express arbitrarily complex stream queries with a single string.

      // merge two distinct searches together on common fields
      merge(
        search(collection1, q="id:(0 3 4)", fl="id,a_s,a_i,a_f", sort="a_f asc, a_s asc"),
        search(collection2, q="id:(1 2)", fl="id,a_s,a_i,a_f", sort="a_f asc, a_s asc"),
        on="a_f asc, a_s asc")
      
      // find top 20 unique records of a search
      top(
        n=20,
        unique(
          search(collection1, q=*:*, fl="id,a_s,a_i,a_f", sort="a_f desc"),
          over="a_f desc"),
        sort="a_f desc")
      

      The syntax would support
      1. Configurable expression names (eg. via solrconfig.xml one can map "unique" to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish.
      2. Named parameters (of both simple and expression types)
      3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream)
      4. Positional parameters

      The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems.

      1. SOLR-7377.patch
        297 kB
        Joel Bernstein
      2. SOLR-7377.patch
        286 kB
        Joel Bernstein
      3. SOLR-7377.patch
        287 kB
        Joel Bernstein
      4. SOLR-7377.patch
        286 kB
        Joel Bernstein
      5. SOLR-7377.patch
        290 kB
        Joel Bernstein
      6. SOLR-7377.patch
        286 kB
        Joel Bernstein
      7. SOLR-7377.patch
        275 kB
        Joel Bernstein
      8. SOLR-7377.patch
        274 kB
        Dennis Gove
      9. SOLR-7377.patch
        274 kB
        Dennis Gove
      10. SOLR-7377.patch
        263 kB
        Dennis Gove

        Issue Links

          Activity

          Hide
          Dennis Gove added a comment -

          First-pass patch. Looking for initial feedback.

          Show
          Dennis Gove added a comment - First-pass patch. Looking for initial feedback.
          Hide
          Joel Bernstein added a comment -

          All I can say is awesome!

          Just updated the description to reference SOLR-7082.

          Show
          Joel Bernstein added a comment - All I can say is awesome! Just updated the description to reference SOLR-7082 .
          Hide
          Dennis Gove added a comment - - edited

          Updated the patch based on some additional items I wanted to include in this. Note that this patch adds a dependency on guava in the solr/solrj/ivy.xml file. We may want to revisit this additional dependency. Guava is being used for some basic string checks (to ensure operations only include supported characters) and this logic could be coded up if we want to avoid added a dependency.

          <dependency org="com.google.guava" name="guava" rev="${/com.google.guava/guava}" conf="compile"/>
          
          Show
          Dennis Gove added a comment - - edited Updated the patch based on some additional items I wanted to include in this. Note that this patch adds a dependency on guava in the solr/solrj/ivy.xml file. We may want to revisit this additional dependency. Guava is being used for some basic string checks (to ensure operations only include supported characters) and this logic could be coded up if we want to avoid added a dependency. <dependency org= "com.google.guava" name= "guava" rev= "${/com.google.guava/guava}" conf= "compile" />
          Hide
          Dennis Gove added a comment -

          Now allows a search expression to include a zkHost (though does not require it). Improved performance of EqualToComparator by moving some branching logic into the constructor and creating a lambda for the actual comparison.

          Show
          Dennis Gove added a comment - Now allows a search expression to include a zkHost (though does not require it). Improved performance of EqualToComparator by moving some branching logic into the constructor and creating a lambda for the actual comparison.
          Hide
          Dennis Gove added a comment -

          Added ability to turn an ExpressibleStream into a StreamExpression. Combined with the already existing ability to turn a StreamExpression into a string, we can now go back and forth from string <--> stream.

          This will allow us to modify ParallelStream to pass along the string expression of the stream it wants to parallelize.

          Show
          Dennis Gove added a comment - Added ability to turn an ExpressibleStream into a StreamExpression. Combined with the already existing ability to turn a StreamExpression into a string, we can now go back and forth from string <--> stream. This will allow us to modify ParallelStream to pass along the string expression of the stream it wants to parallelize.
          Hide
          Erick Erickson added a comment -

          Dennis:

          That patch seems to remove and then re-add a bunch of files. It's 263K in length, did something go weird with generating the diff? Or did you move a bunch of files around or something? Or am I mis-reading it entirely?

          The streaming stuff is cool you gotta admit, thanks for your work here!

          I'm a little confused. There is only one patch from just a few minutes ago. But from the comments, maybe you uploaded your original patch on 10-Apr? If that's the case, I'm guessing you deleted old your old patch when you put this one up. There's no need to do that, just keep re-uploading the patch with the same name. That'll keep all the copies with the older ones grayed out, and it's preferred to do it that way so history is preserved. My question about the size of hte patch is a case in point .

          Show
          Erick Erickson added a comment - Dennis: That patch seems to remove and then re-add a bunch of files. It's 263K in length, did something go weird with generating the diff? Or did you move a bunch of files around or something? Or am I mis-reading it entirely? The streaming stuff is cool you gotta admit, thanks for your work here! I'm a little confused. There is only one patch from just a few minutes ago. But from the comments, maybe you uploaded your original patch on 10-Apr? If that's the case, I'm guessing you deleted old your old patch when you put this one up. There's no need to do that, just keep re-uploading the patch with the same name. That'll keep all the copies with the older ones grayed out, and it's preferred to do it that way so history is preserved. My question about the size of hte patch is a case in point .
          Hide
          Dennis Gove added a comment -

          Well that makes it much clearer for me. I'm sorry for deleting all the older patches. In my reading of the "How to Contribute" I was under the impression that uploading a new patch (with same name) would just replace the old one. Each time I uploaded a new version I would see the previous one still there and figured I'd done something wrong so went ahead and deleted the old version. It didn't occur to me that the old versions would still stay there but just be greyed out. I won't be deleting the old versions going forward. Thanks for clearing that up for me!

          The size of the patch is a function of a bit of package refactoring in the org.apache.solr.client.solrj.io package. This seems to be resulting in the diff showing a bunch of deleted/added files.

          Show
          Dennis Gove added a comment - Well that makes it much clearer for me. I'm sorry for deleting all the older patches. In my reading of the "How to Contribute" I was under the impression that uploading a new patch (with same name) would just replace the old one. Each time I uploaded a new version I would see the previous one still there and figured I'd done something wrong so went ahead and deleted the old version. It didn't occur to me that the old versions would still stay there but just be greyed out. I won't be deleting the old versions going forward. Thanks for clearing that up for me! The size of the patch is a function of a bit of package refactoring in the org.apache.solr.client.solrj.io package. This seems to be resulting in the diff showing a bunch of deleted/added files.
          Hide
          Erick Erickson added a comment -

          No apology necessary, just figured you might be confused by something about "how we do things around here" ... I'll see what I can to to clarify the "How to Contribute" page.

          And thanks for letting me know about the refactoring, just wanted to be sure that things were as you expected.

          Show
          Erick Erickson added a comment - No apology necessary, just figured you might be confused by something about "how we do things around here" ... I'll see what I can to to clarify the "How to Contribute" page. And thanks for letting me know about the refactoring, just wanted to be sure that things were as you expected.
          Hide
          Joel Bernstein added a comment - - edited

          Was reviewing the new Comparator package. Curious about the name EqualToComparator. This makes it sound like it's only used to determine equality. But it's also being used in certain situations to determine sort order. Since an equality comparator makes sense in certain situations, like with the ReducerStream, does it make sense to have two Comparator implementations?

          Show
          Joel Bernstein added a comment - - edited Was reviewing the new Comparator package. Curious about the name EqualToComparator. This makes it sound like it's only used to determine equality. But it's also being used in certain situations to determine sort order. Since an equality comparator makes sense in certain situations, like with the ReducerStream, does it make sense to have two Comparator implementations?
          Hide
          Dennis Gove added a comment -

          I was thinking that all comparators, no matter their implemented comparison logic, return one of three basic values when comparing A and B.

          1. A and B are logically equal to each other
          2. A is logically before B
          3. A is logically after B

          The implemented comparison logic is then wholly dependent on what one might be intending to use the comparator for. For example, EqualToComparator's implemented comparison logic will return that A and B are logically equal if they are in fact equal to each other. Its logically before/after response depends on the sort order (ascending or descending) but is basically deciding if A is less than B or if A is greater than B.

          One could, if they wanted to, create a comparator returning that two dates are logically equal to each other if they occur within the same week. Or a comparator returning that two numbers are logically equal if their values are within the same logarithmic order of magnitude. So on and so forth.

          My thinking is that comparators determine the logical comparison and make no assumption on what that implemented logic is. This leaves open the possibility of implementing other comparators for given situations as they arise.

          Show
          Dennis Gove added a comment - I was thinking that all comparators, no matter their implemented comparison logic, return one of three basic values when comparing A and B. 1. A and B are logically equal to each other 2. A is logically before B 3. A is logically after B The implemented comparison logic is then wholly dependent on what one might be intending to use the comparator for. For example, EqualToComparator's implemented comparison logic will return that A and B are logically equal if they are in fact equal to each other. Its logically before/after response depends on the sort order (ascending or descending) but is basically deciding if A is less than B or if A is greater than B. One could, if they wanted to, create a comparator returning that two dates are logically equal to each other if they occur within the same week. Or a comparator returning that two numbers are logically equal if their values are within the same logarithmic order of magnitude. So on and so forth. My thinking is that comparators determine the logical comparison and make no assumption on what that implemented logic is. This leaves open the possibility of implementing other comparators for given situations as they arise.
          Hide
          Joel Bernstein added a comment - - edited

          Ok, I see what you're getting at.

          I'd been considering a variation on comparators that just determine equality. This is because some of the stream implementations like the UniqueStream and ReducerStream only need a way to determine Tuple equality. So I had already done a little research and turned up this:

          http://www.learnabout-electronics.org/Digital/dig43.php

          You'll see a section on Equality Comparators.

          So, I'm thinking a simpler name like FieldComp would do the trick.

          No need to change just yet. Feel free to finish up the patch using the EqualToComparator. Before we do the commit we'll go through and finalize the names.

          Show
          Joel Bernstein added a comment - - edited Ok, I see what you're getting at. I'd been considering a variation on comparators that just determine equality. This is because some of the stream implementations like the UniqueStream and ReducerStream only need a way to determine Tuple equality. So I had already done a little research and turned up this: http://www.learnabout-electronics.org/Digital/dig43.php You'll see a section on Equality Comparators. So, I'm thinking a simpler name like FieldComp would do the trick. No need to change just yet. Feel free to finish up the patch using the EqualToComparator. Before we do the commit we'll go through and finalize the names.
          Hide
          Dennis Gove added a comment - - edited

          Made ParallelStream an ExpressibleStream and modified the StreamHandler to accept stream expression strings instead of bytecode.

          Refactored operation and operand into functionName and parameter. And refactored all required references and tangentially related variable/class names.

          Renamed EqualToComparator to FieldComparator to be a little more descriptive in the name.

          Added ability to support pluggable streams by making it something you can configure in solrconfig.xml.

          All stream-related tests pass. At this point I'd consider this functionally complete.

          Show
          Dennis Gove added a comment - - edited Made ParallelStream an ExpressibleStream and modified the StreamHandler to accept stream expression strings instead of bytecode. Refactored operation and operand into functionName and parameter. And refactored all required references and tangentially related variable/class names. Renamed EqualToComparator to FieldComparator to be a little more descriptive in the name. Added ability to support pluggable streams by making it something you can configure in solrconfig.xml. All stream-related tests pass. At this point I'd consider this functionally complete.
          Hide
          Joel Bernstein added a comment -

          Dennis,

          Thanks for this awesome piece of work. I should have a chance to review over the next couple days.

          Show
          Joel Bernstein added a comment - Dennis, Thanks for this awesome piece of work. I should have a chance to review over the next couple days.
          Hide
          Dennis Gove added a comment -

          Fixed a bug in CloudSolrStream when handling aliases. When filtering out the stream-only parameters from those that need to be passed to SOLR for query I was checking for parameter name "alias" when I should have been checking for "aliases".

          Show
          Dennis Gove added a comment - Fixed a bug in CloudSolrStream when handling aliases. When filtering out the stream-only parameters from those that need to be passed to SOLR for query I was checking for parameter name "alias" when I should have been checking for "aliases".
          Hide
          Joel Bernstein added a comment -

          New patch is an svn diff that handles all the moving around of files in svn.

          Show
          Joel Bernstein added a comment - New patch is an svn diff that handles all the moving around of files in svn.
          Hide
          Noble Paul added a comment -

          The patch is mostly moving files from one package to another. Is it possible to get a patch that just adds the new functionality and do the package change later?

          Show
          Noble Paul added a comment - The patch is mostly moving files from one package to another. Is it possible to get a patch that just adds the new functionality and do the package change later?
          Hide
          Dennis Gove added a comment -

          I'm not totally against doing that but I feel like the refactoring is a required piece of this patch. I could, however, create a new ticket with just the refactoring and then make this one depend on that one.

          I am worried that such a ticket might look like unnecessary refactoring. Without the expression stuff added here I think the streaming stuff has a reasonable home in org.apache.solr.client.solrj.io.

          That said, I certainly understand the benefit of smaller patches.

          Show
          Dennis Gove added a comment - I'm not totally against doing that but I feel like the refactoring is a required piece of this patch. I could, however, create a new ticket with just the refactoring and then make this one depend on that one. I am worried that such a ticket might look like unnecessary refactoring. Without the expression stuff added here I think the streaming stuff has a reasonable home in org.apache.solr.client.solrj.io. That said, I certainly understand the benefit of smaller patches.
          Hide
          Joel Bernstein added a comment -

          This patch also changes every single class in the package. So even if we do a separate ticket for the package move we'll still have a patch with fairly significant changes to all the classes.

          I think the best way to review the ticket is probably to apply the patch and then look at the overall package.

          Show
          Joel Bernstein added a comment - This patch also changes every single class in the package. So even if we do a separate ticket for the package move we'll still have a patch with fairly significant changes to all the classes. I think the best way to review the ticket is probably to apply the patch and then look at the overall package.
          Hide
          Joel Bernstein added a comment - - edited

          This patch looks really good. I'm planning on making only three changes:

          1) Allowing the ParallelStream to either use object serialization or Streaming Expressions as a transport mechanism. There will be use cases where people will want to use the Streaming API directly and not bother with Streaming Expressions constructs.

          2) Add some tests for StreamingExpressions that use the ParallelStream.

          3) Add an ExpressionRunner that runs expressions from the command line. The initial version of this will only support built-in expressions. We should decide if we want to break out the name-to-class mapping in an external properties file so it can be used both by Solr and the ExpressionRunner.

          Show
          Joel Bernstein added a comment - - edited This patch looks really good. I'm planning on making only three changes: 1) Allowing the ParallelStream to either use object serialization or Streaming Expressions as a transport mechanism. There will be use cases where people will want to use the Streaming API directly and not bother with Streaming Expressions constructs. 2) Add some tests for StreamingExpressions that use the ParallelStream. 3) Add an ExpressionRunner that runs expressions from the command line. The initial version of this will only support built-in expressions. We should decide if we want to break out the name-to-class mapping in an external properties file so it can be used both by Solr and the ExpressionRunner.
          Hide
          Yonik Seeley added a comment -

          3) Add an ExpressionRunner that runs expressions from the command line.

          Can you expand on this some? Our "command line" in the past has essentially been "curl", so I'm wondering what's different here that would make a command line tool specific to expressions make sense.

          Show
          Yonik Seeley added a comment - 3) Add an ExpressionRunner that runs expressions from the command line. Can you expand on this some? Our "command line" in the past has essentially been "curl", so I'm wondering what's different here that would make a command line tool specific to expressions make sense.
          Hide
          Joel Bernstein added a comment -

          I was thinking about having a local client (ExpressionRunner) that could run the expression and be used as part of a unix pipeline.

          Also a local client can save a hop in the case of a parallel stream. The local client can directly push the Expression to the workers nodes.

          Show
          Joel Bernstein added a comment - I was thinking about having a local client (ExpressionRunner) that could run the expression and be used as part of a unix pipeline. Also a local client can save a hop in the case of a parallel stream. The local client can directly push the Expression to the workers nodes.
          Hide
          Joel Bernstein added a comment -

          The main use case that I have in mind for the unix pipeline is piping data from a Streaming Expression to R.

          But there could be all kinds of fun stuff that could be cooked up with piping Streaming Expression output.

          Show
          Joel Bernstein added a comment - The main use case that I have in mind for the unix pipeline is piping data from a Streaming Expression to R. But there could be all kinds of fun stuff that could be cooked up with piping Streaming Expression output.
          Hide
          Joel Bernstein added a comment - - edited

          New patch with the following changes:

          1) ParallelStream can use either Object serialization or streaming expressions as a transport mechanism. When the ParalleStream is created using a StreamingExpression it defaults to streaming expression transport. When it's created through the Streaming API it defaults to Object serialization.

          2) Added parallel tests for merge, unique, group, and top functions. Sample syntax:

              parallel(collection1, 
                          merge(
                                      search(collection1, q="id:(4 1 8 9)", fl="id,a_s,a_i", sort="a_i desc", partitionKeys="a_i"), 
                                      search(collection1, q="id:(0 2 3 6)", fl="id,a_s,a_i", sort="a_i desc", partitionKeys="a_i"), 
                                      on="a_i desc"), 
                          workers="2", 
                          sort="a_i desc")
          
          

          3) Added the ExpressionRunner class, still needs tests.

          Show
          Joel Bernstein added a comment - - edited New patch with the following changes: 1) ParallelStream can use either Object serialization or streaming expressions as a transport mechanism. When the ParalleStream is created using a StreamingExpression it defaults to streaming expression transport. When it's created through the Streaming API it defaults to Object serialization. 2) Added parallel tests for merge, unique, group, and top functions. Sample syntax: parallel(collection1, merge( search(collection1, q= "id:(4 1 8 9)" , fl= "id,a_s,a_i" , sort= "a_i desc" , partitionKeys= "a_i" ), search(collection1, q= "id:(0 2 3 6)" , fl= "id,a_s,a_i" , sort= "a_i desc" , partitionKeys= "a_i" ), on= "a_i desc" ), workers= "2" , sort= "a_i desc" ) 3) Added the ExpressionRunner class, still needs tests.
          Hide
          Dennis Gove added a comment -

          I don't see the ExpressionRunner in the patch - am I missing it somewhere? Also, I noticed ParallelStream lines 94-100 have some System.out.println lines. I suspect you intended to remove those.

          Tests look good.

          Show
          Dennis Gove added a comment - I don't see the ExpressionRunner in the patch - am I missing it somewhere? Also, I noticed ParallelStream lines 94-100 have some System.out.println lines. I suspect you intended to remove those. Tests look good.
          Hide
          Joel Bernstein added a comment -

          Forgot to add the class. Just put up a new patch with the ExpressionRunner.

          Show
          Joel Bernstein added a comment - Forgot to add the class. Just put up a new patch with the ExpressionRunner.
          Hide
          Joel Bernstein added a comment - - edited

          New patch without the ExressionRunner. While doing manual testing I found that it's simpler just to send an expression to Solr's /stream handler, as Yonik had mentioned. I think the idea of having a command line tool to use with unix pipes still make sense but it can be a ticket unto itself.

          Also made some changes based on the manual testing. For example including the ParallelStream in the default streams known by the StreamHandler.

          Show
          Joel Bernstein added a comment - - edited New patch without the ExressionRunner. While doing manual testing I found that it's simpler just to send an expression to Solr's /stream handler, as Yonik had mentioned. I think the idea of having a command line tool to use with unix pipes still make sense but it can be a ticket unto itself. Also made some changes based on the manual testing. For example including the ParallelStream in the default streams known by the StreamHandler.
          Hide
          Dennis Gove added a comment -

          This looks good, I think.

          Show
          Dennis Gove added a comment - This looks good, I think.
          Hide
          Joel Bernstein added a comment -

          New patch with all tests passing. The main fix is that the StreamHandler.inform method was throwing NPE on non-SolrCloud setups. Once this was fixed all tests passed.

          Show
          Joel Bernstein added a comment - New patch with all tests passing. The main fix is that the StreamHandler.inform method was throwing NPE on non-SolrCloud setups. Once this was fixed all tests passed.
          Hide
          Joel Bernstein added a comment -

          I'll take a look at swapping out the guava code today. I was thinking of either using a Regex or the StreamTokenizer to validate the chars.

          Show
          Joel Bernstein added a comment - I'll take a look at swapping out the guava code today. I was thinking of either using a Regex or the StreamTokenizer to validate the chars.
          Hide
          Dennis Gove added a comment -

          I believe I've found a bug in FieldComparator. I don't have time to create a new patch right now, but the bug is not checking for null on the field before calling compare. Fixed version is below

            private void assignComparator(){
              if(ComparatorOrder.DESCENDING == order){
                // What black magic is this type intersection??
                // Because this class is serializable we need to make sure the lambda is also serializable.
                // This can be done by providing this type intersection on the definition of the lambda.
                // Why not do it in the lambda interface? Functional Interfaces don't allow extends clauses
                comparator = (ComparatorLambda & Serializable)(leftTuple, rightTuple) -> {
                  Comparable leftComp = (Comparable)leftTuple.get(leftField);        
                  Comparable rightComp = (Comparable)rightTuple.get(rightField);
                  
                  if(null == leftComp){ return -1; }
                  if(null == rightComp){ return 1; }
                  
                  return rightComp.compareTo(leftComp);
                };
              }
              else{
                // See above for black magic reasoning.
                comparator = (ComparatorLambda & Serializable)(leftTuple, rightTuple) -> {
                  Comparable leftComp = (Comparable)leftTuple.get(leftField);
                  Comparable rightComp = (Comparable)rightTuple.get(rightField);
                  
                  if(null == leftComp){ return -1; }
                  if(null == rightComp){ return 1; }
                  
                  return leftComp.compareTo(rightComp);
                };
              }
            }
          
          Show
          Dennis Gove added a comment - I believe I've found a bug in FieldComparator. I don't have time to create a new patch right now, but the bug is not checking for null on the field before calling compare. Fixed version is below private void assignComparator(){ if (ComparatorOrder.DESCENDING == order){ // What black magic is this type intersection?? // Because this class is serializable we need to make sure the lambda is also serializable. // This can be done by providing this type intersection on the definition of the lambda. // Why not do it in the lambda interface ? Functional Interfaces don't allow extends clauses comparator = (ComparatorLambda & Serializable)(leftTuple, rightTuple) -> { Comparable leftComp = (Comparable)leftTuple.get(leftField); Comparable rightComp = (Comparable)rightTuple.get(rightField); if ( null == leftComp){ return -1; } if ( null == rightComp){ return 1; } return rightComp.compareTo(leftComp); }; } else { // See above for black magic reasoning. comparator = (ComparatorLambda & Serializable)(leftTuple, rightTuple) -> { Comparable leftComp = (Comparable)leftTuple.get(leftField); Comparable rightComp = (Comparable)rightTuple.get(rightField); if ( null == leftComp){ return -1; } if ( null == rightComp){ return 1; } return leftComp.compareTo(rightComp); }; } }
          Hide
          Joel Bernstein added a comment -

          I'll incorporate this in the next patch.

          Show
          Joel Bernstein added a comment - I'll incorporate this in the next patch.
          Hide
          Yonik Seeley added a comment -

          Should two nulls compare equal?

                  if(null == leftComp){ return rightComp==null ? 0 : -1 }
                  if(null == rightComp){ return 1; }
          

          OR

                  if (leftComp == rightComp) return 0;
                  if(null == leftComp){ return -1; }
                  if(null == rightComp){ return 1; }
          
          Show
          Yonik Seeley added a comment - Should two nulls compare equal? if ( null == leftComp){ return rightComp== null ? 0 : -1 } if ( null == rightComp){ return 1; } OR if (leftComp == rightComp) return 0; if ( null == leftComp){ return -1; } if ( null == rightComp){ return 1; }
          Hide
          Dennis Gove added a comment -

          We may want to make that configurable in solrconfig.xml. Also, should this respect the already configurable setting of whether nulls propagate to the start or end of result sets?

          Show
          Dennis Gove added a comment - We may want to make that configurable in solrconfig.xml. Also, should this respect the already configurable setting of whether nulls propagate to the start or end of result sets?
          Hide
          Yonik Seeley added a comment -

          We may want to make that configurable in solrconfig.xml.

          If it makes sense to have it more than one way (I don't know that it does... just supposing), then it makes sense that some requests would want it one way and other requests would want it the other way. Hence it should be something that one could somehow specify in a request, and not something that one would configure.

          Show
          Yonik Seeley added a comment - We may want to make that configurable in solrconfig.xml. If it makes sense to have it more than one way (I don't know that it does... just supposing), then it makes sense that some requests would want it one way and other requests would want it the other way. Hence it should be something that one could somehow specify in a request, and not something that one would configure.
          Hide
          Dennis Gove added a comment -

          I do agree with you that two nulls should compare equal (I should've included that in my original fix), but I have seen a number of situations where users have balked at the decision (outside of solr).

          That said, I think it's reasonable to insist that two nulls evaluate to equal. (I've never agreed with the case that they wouldn't).

          Were we to make it a user-overridable thing then I do like the idea to make it a query-time decision.

          Show
          Dennis Gove added a comment - I do agree with you that two nulls should compare equal (I should've included that in my original fix), but I have seen a number of situations where users have balked at the decision (outside of solr). That said, I think it's reasonable to insist that two nulls evaluate to equal. (I've never agreed with the case that they wouldn't). Were we to make it a user-overridable thing then I do like the idea to make it a query-time decision.
          Hide
          Joel Bernstein added a comment - - edited

          New patch with two changes:

          1) The StreamHandler was getting a null value from the call to ZkController.getBaseUrl() which was causing a null pointer if the zkHost was not provided in the expression. I resolved this by calling zkController.getZkServerAddress() instead.

          2) Removed the dependency on guava, by adding the static StreamExpressionParser.wordToken method.

          Show
          Joel Bernstein added a comment - - edited New patch with two changes: 1) The StreamHandler was getting a null value from the call to ZkController.getBaseUrl() which was causing a null pointer if the zkHost was not provided in the expression. I resolved this by calling zkController.getZkServerAddress() instead. 2) Removed the dependency on guava, by adding the static StreamExpressionParser.wordToken method.
          Hide
          Joel Bernstein added a comment -

          I'd like to move the null handling to another ticket. There are other issue's here with how the SortingResponseWriter handles nulls, that should be tackled at the same time.

          For this release we can require default values.

          Show
          Joel Bernstein added a comment - I'd like to move the null handling to another ticket. There are other issue's here with how the SortingResponseWriter handles nulls, that should be tackled at the same time. For this release we can require default values.
          Hide
          Joel Bernstein added a comment -

          New patch with precommit passing.

          Show
          Joel Bernstein added a comment - New patch with precommit passing.
          Hide
          Dennis Gove added a comment -

          Updated patch with a few changes.

          FieldComparator and StreamComparator have been collapsed into a single class StreamComparator. There was no need for a separate abstract class.

          Added null checks in StreamComparator. For now if both are null then they will evaluate to equal. We can add a later enhancement under a new ticket to make that configurable.

          Interfaces ExpressibleStream and ExpressibleComparator have been collapsed into interface Expressible. They defined the same interface and there's no reason to have separate interfaces for them.

          Passes precommit checks.

          Show
          Dennis Gove added a comment - Updated patch with a few changes. FieldComparator and StreamComparator have been collapsed into a single class StreamComparator. There was no need for a separate abstract class. Added null checks in StreamComparator. For now if both are null then they will evaluate to equal. We can add a later enhancement under a new ticket to make that configurable. Interfaces ExpressibleStream and ExpressibleComparator have been collapsed into interface Expressible. They defined the same interface and there's no reason to have separate interfaces for them. Passes precommit checks.
          Hide
          Joel Bernstein added a comment - - edited

          I believe we are a little out of sync on the patches. My last patch has the guava dependency removed and I see it back in with Dennis's latest patch.

          I think the best thing to do is to get this committed to trunk so we can start working with smaller patches.

          My plan is to do some more manual testing, and then commit to trunk in a day or two.

          Show
          Joel Bernstein added a comment - - edited I believe we are a little out of sync on the patches. My last patch has the guava dependency removed and I see it back in with Dennis's latest patch. I think the best thing to do is to get this committed to trunk so we can start working with smaller patches. My plan is to do some more manual testing, and then commit to trunk in a day or two.
          Hide
          Dennis Gove added a comment -

          I can't figure out how I screwed that up - my only thought is that when I pulled down your latest with curl it failed and I didn't notice. Internet access on Amtrak trains can be splotchy. My apologies, I'll be more careful in the future. Let's forget my latest patch - I'll add those in a new smaller one after this is in trunk

          Show
          Dennis Gove added a comment - I can't figure out how I screwed that up - my only thought is that when I pulled down your latest with curl it failed and I didn't notice. Internet access on Amtrak trains can be splotchy. My apologies, I'll be more careful in the future. Let's forget my latest patch - I'll add those in a new smaller one after this is in trunk
          Hide
          Joel Bernstein added a comment -

          No worries. It's a big piece of work. We'll refine it more after it lands in trunk.

          Show
          Joel Bernstein added a comment - No worries. It's a big piece of work. We'll refine it more after it lands in trunk.
          Hide
          ASF subversion and git services added a comment -

          Commit 1678743 from Joel Bernstein in branch 'dev/trunk'
          [ https://svn.apache.org/r1678743 ]

          SOLR-7377: Streaming Expressions

          Show
          ASF subversion and git services added a comment - Commit 1678743 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1678743 ] SOLR-7377 : Streaming Expressions
          Hide
          ASF subversion and git services added a comment -

          Commit 1679376 from Joel Bernstein in branch 'dev/trunk'
          [ https://svn.apache.org/r1679376 ]

          SOLR-7377,SOLR-7524:Make Streaming Expressions Java 7 Compatible

          Show
          ASF subversion and git services added a comment - Commit 1679376 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1679376 ] SOLR-7377 , SOLR-7524 :Make Streaming Expressions Java 7 Compatible
          Hide
          ASF subversion and git services added a comment -

          Commit 1679407 from Joel Bernstein in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1679407 ]

          SOLR-7377,SOLR-7524:Make Streaming Expressions Java 7 Compatible

          Show
          ASF subversion and git services added a comment - Commit 1679407 from Joel Bernstein in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1679407 ] SOLR-7377 , SOLR-7524 :Make Streaming Expressions Java 7 Compatible
          Hide
          ASF subversion and git services added a comment -

          Commit 1679599 from Joel Bernstein in branch 'dev/trunk'
          [ https://svn.apache.org/r1679599 ]

          SOLR-7377: Update CHANGES.txt

          Show
          ASF subversion and git services added a comment - Commit 1679599 from Joel Bernstein in branch 'dev/trunk' [ https://svn.apache.org/r1679599 ] SOLR-7377 : Update CHANGES.txt
          Hide
          ASF subversion and git services added a comment -

          Commit 1679600 from Joel Bernstein in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1679600 ]

          SOLR-7377: Update CHANGES.txt

          Show
          ASF subversion and git services added a comment - Commit 1679600 from Joel Bernstein in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1679600 ] SOLR-7377 : Update CHANGES.txt
          Hide
          Shalin Shekhar Mangar added a comment -

          Any specific reason why this uses Java serialization? That's a horrible choice for a high-performance streaming framework. Can we switch to javabin?

          Show
          Shalin Shekhar Mangar added a comment - Any specific reason why this uses Java serialization? That's a horrible choice for a high-performance streaming framework. Can we switch to javabin?
          Hide
          Joel Bernstein added a comment -

          The Streaming API has three transport mechanisms:

          1) Data is transported with JSON.
          2) The ParallelStream originally pushed bytecode out to worker nodes in parallel operations using Java serialization. This was only executable streaming code being pushed to worker nodes.
          3) Streaming Expressions provides a concise syntax for the ParallelStream to push streaming code to worker nodes. This ticket adds that functionality. It's extremely efficient.

          Show
          Joel Bernstein added a comment - The Streaming API has three transport mechanisms: 1) Data is transported with JSON. 2) The ParallelStream originally pushed bytecode out to worker nodes in parallel operations using Java serialization. This was only executable streaming code being pushed to worker nodes. 3) Streaming Expressions provides a concise syntax for the ParallelStream to push streaming code to worker nodes. This ticket adds that functionality. It's extremely efficient.
          Hide
          Joel Bernstein added a comment -

          One of cool things about Streaming Expressions is that provides a high performance way to push operations to worker nodes ...

          AND

          It's a query language that can be used on it's own.

          Streaming Expressions can model any operation that be done with the Streaming API.

          The new parallel SQL interface, compiles SQL statements to Streaming API objects. Those object are then serialized to Streaming Expressions and sent to worker nodes to handle the map/reduce aggregations in parallel.

          Show
          Joel Bernstein added a comment - One of cool things about Streaming Expressions is that provides a high performance way to push operations to worker nodes ... AND It's a query language that can be used on it's own. Streaming Expressions can model any operation that be done with the Streaming API. The new parallel SQL interface, compiles SQL statements to Streaming API objects. Those object are then serialized to Streaming Expressions and sent to worker nodes to handle the map/reduce aggregations in parallel.
          Hide
          Noble Paul added a comment -

          You may use javabin for more efficient faster streaming

          Show
          Noble Paul added a comment - You may use javabin for more efficient faster streaming
          Hide
          Joel Bernstein added a comment -

          Streaming data with JSON was just the first step to get everything working.

          When we start optimizing, we can add faster transport mechanisms.

          Show
          Joel Bernstein added a comment - Streaming data with JSON was just the first step to get everything working. When we start optimizing, we can add faster transport mechanisms.
          Hide
          Joel Bernstein added a comment -

          To get a basic understanding of how the Streaming API, Streaming Expressions and Parallel SQL come together take a look at the

          the doGroupByWithAggregates method in:

          https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SQLHandler.java

          Show
          Joel Bernstein added a comment - To get a basic understanding of how the Streaming API, Streaming Expressions and Parallel SQL come together take a look at the the doGroupByWithAggregates method in: https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/SQLHandler.java

            People

            • Assignee:
              Unassigned
              Reporter:
              Dennis Gove
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development