Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8467

CloudSolrStream and FacetStream should take a SolrParams object rather than a Map<String, String> to allow more complex Solr queries to be specified

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, master (7.0)
    • Component/s: None
    • Labels:
      None

      Description

      Currently, it's impossible to, say, specify multiple "fq" clauses when using Streaming Aggregation due to the fact that the c'tors take a Map of params.

      Opening to discuss whether we should
      1> deprecate the current c'tor
      and/or
      2> add a c'tor that takes a SolrParams object instead.
      and/or
      3> ???

      I don't see a clean way to go from a Map<String, String> to a (Modifiable)SolrParams, so existing code would need a significant change. I hacked together a PoC, just to see if I could make CloudSolrStream take a ModifiableSolrParams object instead and it passes tests, but it's so bad that I'm not going to even post it. There's got to be a better way to do this, but at least it's possible....

      1. SOLR-8467.patch
        156 kB
        Erick Erickson
      2. SOLR-8467.patch
        144 kB
        Erick Erickson
      3. SOLR-8467.patch
        144 kB
        Erick Erickson
      4. SOLR-8467.patch
        133 kB
        Erick Erickson
      5. SOLR-8467.patch
        142 kB
        Erick Erickson
      6. SOLR-8467.patch
        155 kB
        Erick Erickson
      7. SOLR-8467.patch
        56 kB
        Erick Erickson
      8. SOLR-8647.patch
        55 kB
        Erick Erickson
      9. SOLR-8647.patch
        45 kB
        Erick Erickson

        Activity

        Hide
        erickerickson Erick Erickson added a comment -

        Here's the patch I've worked up so far.

        Dennis GoveJoel Bernstein I'm particularly interested in any comments you have.

        I did one gratuitous thing, in StreamingTest moved all of the del :/commits to the beginning of the methods, I think it's fragile to depend on old tests cleaning themselves up, I vastly prefer the test that needs things to be in some state to insure that state.

        I also noticed that the tests here rarely close their CloudSolrStreams, is that intentional? Or harmless?

        Anyway, there are still some nocommits in there, and I'm pretty hazy on whether the changes I made to streaming expressions were the right things. All the tests pass so at least that's a Good Thing.

        Is there any good reason we don't have access to commons.lang? In particular StringUtils.join would be useful in one instance. It's easy enough to write a function, but I bet the one in commons is more efficient.

        So I did comment out the c'tors that take a Map completely and chased down all the places I found that referenced that format. I then un-commented them for back-compat, but I'm pretty sure I found all the places in Solr code that uses those c'tors and then replaced them.

        You'll note that tests in StreamingTest randomly chooses one c'tor or the other just to exercise both. Of course if we really deprecate the old version that'll have to be changed, but that's trivial.

        Let me know what you think. I'll be away for a couple of days so if I'm kind of silent you'll know why.

        Show
        erickerickson Erick Erickson added a comment - Here's the patch I've worked up so far. Dennis Gove Joel Bernstein I'm particularly interested in any comments you have. I did one gratuitous thing, in StreamingTest moved all of the del : /commits to the beginning of the methods, I think it's fragile to depend on old tests cleaning themselves up, I vastly prefer the test that needs things to be in some state to insure that state. I also noticed that the tests here rarely close their CloudSolrStreams, is that intentional? Or harmless? Anyway, there are still some nocommits in there, and I'm pretty hazy on whether the changes I made to streaming expressions were the right things. All the tests pass so at least that's a Good Thing. Is there any good reason we don't have access to commons.lang? In particular StringUtils.join would be useful in one instance. It's easy enough to write a function, but I bet the one in commons is more efficient. So I did comment out the c'tors that take a Map completely and chased down all the places I found that referenced that format. I then un-commented them for back-compat, but I'm pretty sure I found all the places in Solr code that uses those c'tors and then replaced them. You'll note that tests in StreamingTest randomly chooses one c'tor or the other just to exercise both. Of course if we really deprecate the old version that'll have to be changed, but that's trivial. Let me know what you think. I'll be away for a couple of days so if I'm kind of silent you'll know why.
        Hide
        dpgove Dennis Gove added a comment -

        I can't think of any reason why accepting a [Modifiable]SolrParams object instead of a Map<String,String> would be a bad idea. I like this change.

        Show
        dpgove Dennis Gove added a comment - I can't think of any reason why accepting a [Modifiable] SolrParams object instead of a Map<String,String> would be a bad idea. I like this change.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        We'll want to do this for the FacetStream as well. I'll update the ticket to reflect this.

        Show
        joel.bernstein Joel Bernstein added a comment - We'll want to do this for the FacetStream as well. I'll update the ticket to reflect this.
        Hide
        erickerickson Erick Erickson added a comment -

        Same patch with FacetStream changes, still some nocommits.

        Show
        erickerickson Erick Erickson added a comment - Same patch with FacetStream changes, still some nocommits.
        Hide
        erickerickson Erick Erickson added a comment -

        Patch that uses commons StringUtils. NOTE: This adds the dependency to SolrJ for commons.lang for the first time, any objections?

        And this one uses the correct name, I transposed a the 4 and 6 in the first two patches. So this one supercedes SOLR-8674 from the 29th

        Show
        erickerickson Erick Erickson added a comment - Patch that uses commons StringUtils. NOTE: This adds the dependency to SolrJ for commons.lang for the first time, any objections? And this one uses the correct name, I transposed a the 4 and 6 in the first two patches. So this one supercedes SOLR-8674 from the 29th
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Patch that uses commons StringUtils. NOTE: This adds the dependency to SolrJ for commons.lang for the first time, any objections?

        You could just use StrUtils.join instead?

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Patch that uses commons StringUtils. NOTE: This adds the dependency to SolrJ for commons.lang for the first time, any objections? You could just use StrUtils.join instead?
        Hide
        erickerickson Erick Erickson added a comment -

        Didn't even know it existed. It compiles at least, running a test now.

        Thanks! That's better than including a new dependency.

        Show
        erickerickson Erick Erickson added a comment - Didn't even know it existed. It compiles at least, running a test now. Thanks! That's better than including a new dependency.
        Hide
        erickerickson Erick Erickson added a comment -

        Joel BernsteinDennis Gove This is getting reasonably close to being committable, but I ran into something I wanted to run past you.

        in GatherNodeStream.JoinRunner (starting line 386 in current trunk) there's some manipulation of the params. Starting at line 412 there's this bit:

        if(queryParams.containsKey("fl")) {
        String flString = (String)queryParams.get("fl");
        String[] flArray = flString.split(",");
        for(String f : flArray)

        { flSet.add(f.trim()); }

        }

        Since flSet is a HashSet, doesn't this fold all of the "fl" parameters into a single entry so if you have fl=a,b,c the result would only be 'c'?

        And what's right here anyway? Does it make sense to concatenate the "fl" parameters from the queryParams to the joinParams and add the "special" fl params ('gather' and 'traverseTo' and metrics)? Or should the joinParams just contain the "special" params?

        I don't really see how to return more fls in the tuple that GatherNodesStream returns, if it should please enlighten me

        Show
        erickerickson Erick Erickson added a comment - Joel Bernstein Dennis Gove This is getting reasonably close to being committable, but I ran into something I wanted to run past you. in GatherNodeStream.JoinRunner (starting line 386 in current trunk) there's some manipulation of the params. Starting at line 412 there's this bit: if(queryParams.containsKey("fl")) { String flString = (String)queryParams.get("fl"); String[] flArray = flString.split(","); for(String f : flArray) { flSet.add(f.trim()); } } Since flSet is a HashSet, doesn't this fold all of the "fl" parameters into a single entry so if you have fl=a,b,c the result would only be 'c'? And what's right here anyway? Does it make sense to concatenate the "fl" parameters from the queryParams to the joinParams and add the "special" fl params ('gather' and 'traverseTo' and metrics)? Or should the joinParams just contain the "special" params? I don't really see how to return more fls in the tuple that GatherNodesStream returns, if it should please enlighten me
        Hide
        erickerickson Erick Erickson added a comment -

        OK, I was able to beast the code for GraphExpressionTest and StreamExpressionTest and they seem OK. Well, I am getting a lot of

        "Unable to build keystore from file: null" errors, but I'm ignoring them so far.

        I'm running full tests now, but I've attached a patch for what I currently have, nocommits and all. You'll see a lot of commented-out code in GatherNodeStream pending the resolution of my note from earlier today.

        Meanwhile, are there objections to the approach? As I mentioned, I need to clean some things up before committing, and I think SOLR-8925 needs to be applied to 6x before this patch can go in....

        I'd like to keep from getting too far out of date, last time I let it languish I caused myself a lot of extra work as the underlying code is changing pretty rapidly, so comments appreciated.

        Show
        erickerickson Erick Erickson added a comment - OK, I was able to beast the code for GraphExpressionTest and StreamExpressionTest and they seem OK. Well, I am getting a lot of "Unable to build keystore from file: null" errors, but I'm ignoring them so far. I'm running full tests now, but I've attached a patch for what I currently have, nocommits and all. You'll see a lot of commented-out code in GatherNodeStream pending the resolution of my note from earlier today. Meanwhile, are there objections to the approach? As I mentioned, I need to clean some things up before committing, and I think SOLR-8925 needs to be applied to 6x before this patch can go in.... I'd like to keep from getting too far out of date, last time I let it languish I caused myself a lot of extra work as the underlying code is changing pretty rapidly, so comments appreciated.
        Hide
        erickerickson Erick Erickson added a comment -

        Updated for current trunk, still have cleanup to do.

        Show
        erickerickson Erick Erickson added a comment - Updated for current trunk, still have cleanup to do.
        Hide
        joel.bernstein Joel Bernstein added a comment - - edited

        I think this ticket makes sense but I was shying away from it because it ends up touching such a large amount of code.

        A couple things to consider before committing:

        1) We need some tests for Streaming Expressions with multi-params. It may be that the expression parser needs to be changed to support this. If that's the case we should hold off until we get that working.

        2) Let's shoot for getting this into trunk following the 6.1 release. The reason behind this is that it touches enough stuff that it's likely to cause some headaches finishing some fairly big features in trunk which are still getting shaped up for the 6.1

        3) Lastly I'd like to spend a little more time reviewing before this it gets committed and I'm pretty focused on getting the current 6.1 streaming features together.

        If this patch falls behind again I can spend the time catching it up following the 6.1 release.

        Show
        joel.bernstein Joel Bernstein added a comment - - edited I think this ticket makes sense but I was shying away from it because it ends up touching such a large amount of code. A couple things to consider before committing: 1) We need some tests for Streaming Expressions with multi-params. It may be that the expression parser needs to be changed to support this. If that's the case we should hold off until we get that working. 2) Let's shoot for getting this into trunk following the 6.1 release. The reason behind this is that it touches enough stuff that it's likely to cause some headaches finishing some fairly big features in trunk which are still getting shaped up for the 6.1 3) Lastly I'd like to spend a little more time reviewing before this it gets committed and I'm pretty focused on getting the current 6.1 streaming features together. If this patch falls behind again I can spend the time catching it up following the 6.1 release.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        The fl handling you mention seems to be wrong. I'll give this a closer review.

        Show
        joel.bernstein Joel Bernstein added a comment - The fl handling you mention seems to be wrong. I'll give this a closer review.
        Hide
        erickerickson Erick Erickson added a comment -

        I'll try to keep this up to date more often now that I've learned the 'git stash' and 'git stash pop' trick.

        I'll also clean up the nocommits and the like sometime Real Soon Now.

        Do note that there's some nonsense in the test cases to randomly switch back and forth between the Map and SolrParams c'tors in the classes. Mostly that's there to make sure the translation I made in the c'tors from Map to Solrparams gets exercised. That said, since there are no more uses of the Map c'tor I could easily argue that it's not required that we keep exercising the deprecated (in this patch) Map c'tors, WDYT?

        Show
        erickerickson Erick Erickson added a comment - I'll try to keep this up to date more often now that I've learned the 'git stash' and 'git stash pop' trick. I'll also clean up the nocommits and the like sometime Real Soon Now. Do note that there's some nonsense in the test cases to randomly switch back and forth between the Map and SolrParams c'tors in the classes. Mostly that's there to make sure the translation I made in the c'tors from Map to Solrparams gets exercised. That said, since there are no more uses of the Map c'tor I could easily argue that it's not required that we keep exercising the deprecated (in this patch) Map c'tors, WDYT?
        Hide
        dpgove Dennis Gove added a comment - - edited

        1) We need some tests for Streaming Expressions with multi-params. It may be that the expression parser needs to be changed to support this. If that's the case we should hold off until we get that working.

        I don't think there'll need to be any change here. For the source streams we pull out all known named parameters and then all other parameters are just passed along to Solr as standard params. For those which are just passed along there shouldn't be any consideration of the parameter names and if I recall correctly duplicate names are allowable.

        Show
        dpgove Dennis Gove added a comment - - edited 1) We need some tests for Streaming Expressions with multi-params. It may be that the expression parser needs to be changed to support this. If that's the case we should hold off until we get that working. I don't think there'll need to be any change here. For the source streams we pull out all known named parameters and then all other parameters are just passed along to Solr as standard params. For those which are just passed along there shouldn't be any consideration of the parameter names and if I recall correctly duplicate names are allowable.
        Hide
        erickerickson Erick Erickson added a comment -

        In fact I had a few down minutes and put in a couple of multi-parameter tests and they seemed to work just fine. I'll add them to the patch.

        They're fairly trivial, but at least start to exercise the option.

        Show
        erickerickson Erick Erickson added a comment - In fact I had a few down minutes and put in a couple of multi-parameter tests and they seemed to work just fine. I'll add them to the patch. They're fairly trivial, but at least start to exercise the option.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        I think just a few basic tests is all that's needed.

        Show
        joel.bernstein Joel Bernstein added a comment - I think just a few basic tests is all that's needed.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        OK, I reviewed the fl handling in gather nodes and it is correct for how params are currently being handled. What it's doing is deriving the field list from the metrics and uniquing them. We'll have to adjust how this is working as part of this ticket.

        Show
        joel.bernstein Joel Bernstein added a comment - OK, I reviewed the fl handling in gather nodes and it is correct for how params are currently being handled. What it's doing is deriving the field list from the metrics and uniquing them. We'll have to adjust how this is working as part of this ticket.
        Hide
        erickerickson Erick Erickson added a comment -

        This is what I was wondering when I asked "What's right here anyway?". If I'm reading your response correctly, we can simply throw away the original "fl" params because we don't care about them anyway right? And uniquify the rest with a set.

        Show
        erickerickson Erick Erickson added a comment - This is what I was wondering when I asked "What's right here anyway?". If I'm reading your response correctly, we can simply throw away the original "fl" params because we don't care about them anyway right? And uniquify the rest with a set.
        Hide
        erickerickson Erick Erickson added a comment -

        Same thing (Unable to build keystore....) happens on unmodified trunk, so I'm pretty confident that's not a result of this patch.

        Show
        erickerickson Erick Erickson added a comment - Same thing (Unable to build keystore....) happens on unmodified trunk, so I'm pretty confident that's not a result of this patch.
        Hide
        erickerickson Erick Erickson added a comment -

        Cleaned up patch. There are still a couple of nocommits that I'd especially like another set of eyes on, I'm not entirely comfortable with those bits of code.

        This patch removes a bunch of nocommits and old code I commented out. It also tries to regularize the member variables as SolrParams rather than ModifiableSolrParams. While doing that I notices a couple of places where the member variable was actually being modified, things like adding &distrib=false. That seems like A Bad Thing.

        There's some junk in here where I deprecated c'tors that take a Map<String, String> and added a new (preferred) one that takes a SolrParams object. What do y'all think about taking out the c'tor that takes a Map? Is all this code new enough that we don't need to carry that baggage along? I took out the bits in the tests where it randomly chooses the old or new c'tor as they were in there just for preliminary testing....

        An embarrassingly large amount of this patch is the re-formatted schema.xml file. I took out the long-deprecated <types> and <fields> tags and re-indented (just a quick keystroke in IntelliJ). I did have to add one field type though *_dvs for new tests so there's at least one substantive change (not really sure it's necessary though)...

        So, getting close to committing. Do we really want to wait for 6.1? If we take the c'tors that take a Map out the sooner the better.

        Perhaps mark those c'tors as "experimental"? I doubt we'd be inconveniencing too many people at this point since this is so new.

        Show
        erickerickson Erick Erickson added a comment - Cleaned up patch. There are still a couple of nocommits that I'd especially like another set of eyes on, I'm not entirely comfortable with those bits of code. This patch removes a bunch of nocommits and old code I commented out. It also tries to regularize the member variables as SolrParams rather than ModifiableSolrParams. While doing that I notices a couple of places where the member variable was actually being modified, things like adding &distrib=false. That seems like A Bad Thing. There's some junk in here where I deprecated c'tors that take a Map<String, String> and added a new (preferred) one that takes a SolrParams object. What do y'all think about taking out the c'tor that takes a Map? Is all this code new enough that we don't need to carry that baggage along? I took out the bits in the tests where it randomly chooses the old or new c'tor as they were in there just for preliminary testing.... An embarrassingly large amount of this patch is the re-formatted schema.xml file. I took out the long-deprecated <types> and <fields> tags and re-indented (just a quick keystroke in IntelliJ). I did have to add one field type though *_dvs for new tests so there's at least one substantive change (not really sure it's necessary though)... So, getting close to committing. Do we really want to wait for 6.1? If we take the c'tors that take a Map out the sooner the better. Perhaps mark those c'tors as "experimental"? I doubt we'd be inconveniencing too many people at this point since this is so new.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        I think I'd to take some time to review before it's committed. If I can get the review done before 6.1 let's go for it. The major 6.1 streaming work is all in trunk now anyway, so it won't cause issues with that work.

        Show
        joel.bernstein Joel Bernstein added a comment - I think I'd to take some time to review before it's committed. If I can get the review done before 6.1 let's go for it. The major 6.1 streaming work is all in trunk now anyway, so it won't cause issues with that work.
        Hide
        erickerickson Erick Erickson added a comment -

        Updated to latest trunk

        Show
        erickerickson Erick Erickson added a comment - Updated to latest trunk
        Hide
        joel.bernstein Joel Bernstein added a comment -

        So, the tests are going have to be reworked. I said I'd take this on if this ticket fell far behind master. At the time I didn't realize how far behind it would get so quickly. I should have time to review and rework the tests for this ticket over the next week.

        Show
        joel.bernstein Joel Bernstein added a comment - So, the tests are going have to be reworked. I said I'd take this on if this ticket fell far behind master. At the time I didn't realize how far behind it would get so quickly. I should have time to review and rework the tests for this ticket over the next week.
        Hide
        erickerickson Erick Erickson added a comment -

        Every day or two I pull the latest master and re-apply this set of changes. Today was the first day that was at all difficult.

        Please ping me before you start working on it, it may by that I just haven't put up my most recent patch.

        Show
        erickerickson Erick Erickson added a comment - Every day or two I pull the latest master and re-apply this set of changes. Today was the first day that was at all difficult. Please ping me before you start working on it, it may by that I just haven't put up my most recent patch.
        Hide
        erickerickson Erick Erickson added a comment -

        precommit passes, all tests pass. I'll be beasting the Streaming Aggregation tests today and commit soon unless there are objections.

        Show
        erickerickson Erick Erickson added a comment - precommit passes, all tests pass. I'll be beasting the Streaming Aggregation tests today and commit soon unless there are objections.
        Hide
        erickerickson Erick Erickson added a comment -

        I added a new test incorrectly, this corrects the test.

        Except for the test I messed up, beasting is fine. This test correction is half way through a 100 iteration trial with no problems.

        So I think it's ready. Probably Monday or Tuesday I'll commit it.

        Show
        erickerickson Erick Erickson added a comment - I added a new test incorrectly, this corrects the test. Except for the test I messed up, beasting is fine. This test correction is half way through a 100 iteration trial with no problems. So I think it's ready. Probably Monday or Tuesday I'll commit it.
        Hide
        joel.bernstein Joel Bernstein added a comment -

        Ok, sounds good.

        Show
        joel.bernstein Joel Bernstein added a comment - Ok, sounds good.
        Hide
        erickerickson Erick Erickson added a comment -

        No functional changes over last patch, just added solr/CHANGES.txt entry

        Show
        erickerickson Erick Erickson added a comment - No functional changes over last patch, just added solr/CHANGES.txt entry
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit f4359ff8ffd96253ba610865c5e29172307c3c7a in lucene-solr's branch refs/heads/master from Erick Erickson
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f4359ff ]

        SOLR-8467: CloudSolrStream and FacetStream should take a SolrParams object rather than a Map<String, String> to allow more complex Solr queries to be specified

        Show
        jira-bot ASF subversion and git services added a comment - Commit f4359ff8ffd96253ba610865c5e29172307c3c7a in lucene-solr's branch refs/heads/master from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f4359ff ] SOLR-8467 : CloudSolrStream and FacetStream should take a SolrParams object rather than a Map<String, String> to allow more complex Solr queries to be specified
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 73b4defc0751bd0cd2ad999e3a4e6c593fd0fb1c in lucene-solr's branch refs/heads/branch_6x from Erick Erickson
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=73b4def ]

        SOLR-8467: CloudSolrStream and FacetStream should take a SolrParams object rather than a Map<String, String> to allow more complex Solr queries to be specified

        Show
        jira-bot ASF subversion and git services added a comment - Commit 73b4defc0751bd0cd2ad999e3a4e6c593fd0fb1c in lucene-solr's branch refs/heads/branch_6x from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=73b4def ] SOLR-8467 : CloudSolrStream and FacetStream should take a SolrParams object rather than a Map<String, String> to allow more complex Solr queries to be specified

          People

          • Assignee:
            erickerickson Erick Erickson
            Reporter:
            erickerickson Erick Erickson
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development