Details

      Description

      In SolrCloud, found a result grouping bug in the 4.0 release.
      A distributed result grouping request under SolrCloud got this result:

      Dec 10, 2012 10:32:07 PM org.apache.solr.common.SolrException log
      SEVERE: null:java.lang.IllegalArgumentException: numHits must be > 0; please use TotalHitCountCollector if you just need the total hit count
              at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1120)
              at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1069)
              at org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector.<init>(AbstractSecondPassGroupingCollector.java:75)
              at org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector.<init>(TermSecondPassGroupingCollector.java:49)
              at org.apache.solr.search.grouping.distributed.command.TopGroupsFieldCommand.create(TopGroupsFieldCommand.java:128)
              at org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:132)
              at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:339)
              at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
      
      1. SOLR-4164.patch
        17 kB
        Cao Manh Dat

        Activity

        Hide
        hossman Hoss Man added a comment -

        Lance: can you give us any more context into how you got this error? what did the request look like? what did the data look like?

        I tried writing a test to demonstrating the problem and failed to fail...

        Committed revision 1421505.
        Committed revision 1421510.

        Show
        hossman Hoss Man added a comment - Lance: can you give us any more context into how you got this error? what did the request look like? what did the data look like? I tried writing a test to demonstrating the problem and failed to fail... Committed revision 1421505. Committed revision 1421510.
        Hide
        commit-tag-bot Commit Tag Bot added a comment -

        [trunk commit] Chris M. Hostetter
        http://svn.apache.org/viewvc?view=revision&revision=1421505

        grouping tests of case where no docs match main query. written while trying to repro SOLR-4164, but something is still missing

        Show
        commit-tag-bot Commit Tag Bot added a comment - [trunk commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1421505 grouping tests of case where no docs match main query. written while trying to repro SOLR-4164 , but something is still missing
        Hide
        commit-tag-bot Commit Tag Bot added a comment -

        [branch_4x commit] Chris M. Hostetter
        http://svn.apache.org/viewvc?view=revision&revision=1421510

        grouping tests of case where no docs match main query. written while trying to repro SOLR-4164, but something is still missing (merge r1421505)

        Show
        commit-tag-bot Commit Tag Bot added a comment - [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1421510 grouping tests of case where no docs match main query. written while trying to repro SOLR-4164 , but something is still missing (merge r1421505)
        Hide
        lancenorskog Lance Norskog added a comment -

        I can't recreate it. It may have been another problem I was having: a shard server ran out of memory during the query and threw an exception to the distributor. Maybe the group query collection code ignores these remote exceptions?

        Show
        lancenorskog Lance Norskog added a comment - I can't recreate it. It may have been another problem I was having: a shard server ran out of memory during the query and threw an exception to the distributor. Maybe the group query collection code ignores these remote exceptions?
        Hide
        estahn Enrico Stahn added a comment - - edited

        This issue is still not solved with 4.2.1.

        Add:

        Known Limitations: The Grouping feature only works if groups are in the same shard.

        We changed -DnumShards=2 to -DnumShards=1 and it's working since.

        Show
        estahn Enrico Stahn added a comment - - edited This issue is still not solved with 4.2.1. Add: Known Limitations: The Grouping feature only works if groups are in the same shard. We changed -DnumShards=2 to -DnumShards=1 and it's working since.
        Hide
        WebHomer Webster Homer added a comment -

        Found that if you set group.limit to -1 it will give the same failure. We had code that did this with the intent of getting all the documents in the roll up. We limit it to 500 so setting group.limit=500 was a decent work around. Still this worked fine with normal solr, only solrcloud had a problem

        Show
        WebHomer Webster Homer added a comment - Found that if you set group.limit to -1 it will give the same failure. We had code that did this with the intent of getting all the documents in the roll up. We limit it to 500 so setting group.limit=500 was a decent work around. Still this worked fine with normal solr, only solrcloud had a problem
        Hide
        caomanhdat Cao Manh Dat added a comment -

        This problem can happen when we set group.limit = -1 in distributed case. This bug doesn't happen in non distributed mode because inside Grouping we have this check

        int groupedDocsToCollect = getMax(groupOffset, docsPerGroup, maxDoc);
        groupedDocsToCollect = Math.max(groupedDocsToCollect, 1);
        

        But things will be much different in distributed case, so we should ask ourselves is group.limit = -1 is a valid value of group.limit in distributed mode?

        Show
        caomanhdat Cao Manh Dat added a comment - This problem can happen when we set group.limit = -1 in distributed case. This bug doesn't happen in non distributed mode because inside Grouping we have this check int groupedDocsToCollect = getMax(groupOffset, docsPerGroup, maxDoc); groupedDocsToCollect = Math .max(groupedDocsToCollect, 1); But things will be much different in distributed case, so we should ask ourselves is group.limit = -1 is a valid value of group.limit in distributed mode?
        Hide
        hossman Hoss Man added a comment -

        At a minimum, if distributed grouping doesn't work with group.limit=-1 then we should give a clean error saying so rather than an obtuse error.

        IIUC though, what Dat has found is that even in single node solr, group.limit=-1 doesn't give you "unlimited" groupping results – it silently re-writes the effective groupedDocsToCollect value to '1' ... I'm not sure why it does that instead of just giving an error, but if we can make distributed grouping behave the exact same way (and i'm not sure why we couldn't have the same Math.max(groupedDocsToCollect, 1); in the distributed code path) then we should.

        Show
        hossman Hoss Man added a comment - At a minimum, if distributed grouping doesn't work with group.limit=-1 then we should give a clean error saying so rather than an obtuse error. IIUC though, what Dat has found is that even in single node solr, group.limit=-1 doesn't give you "unlimited" groupping results – it silently re-writes the effective groupedDocsToCollect value to '1' ... I'm not sure why it does that instead of just giving an error, but if we can make distributed grouping behave the exact same way (and i'm not sure why we couldn't have the same Math.max(groupedDocsToCollect, 1); in the distributed code path) then we should.
        Hide
        WebHomer Webster Homer added a comment -

        I am certain that in Solr 4.10 non-cloud specifying -1 did roll up all matching documents. We certainly had more than 1 in the group.
        I agree that at a minimum that solr should be consistent and either throw an error or do something reasonable.

        It would be nice to have a way to specify an unlimited roll up

        Show
        WebHomer Webster Homer added a comment - I am certain that in Solr 4.10 non-cloud specifying -1 did roll up all matching documents. We certainly had more than 1 in the group. I agree that at a minimum that solr should be consistent and either throw an error or do something reasonable. It would be nice to have a way to specify an unlimited roll up
        Hide
        hossman Hoss Man added a comment -

        I am certain that in Solr 4.10 non-cloud specifying -1 did roll up all matching documents. We certainly had more than 1 in the group.

        Ok .. weird.

        I see now I missread what Dat was saying ... in single node more getMax is returning maxDoc if docsPerGroup < 0 ... so that's where that unlimited behavior was coming from.

        In general this smells like a very bad idea to me ... i'm really suprised/scared group.limit=-1 has ever worked this way, because it would make it very easy to crash solr/clients depending on what how many docs are in each group being returned – this is the precise reason rows=-1 has never been supported for non-grouping searches.

        In any case, since we can't support the same "unlimited" behavior in cloud mode, let's absolutely add a clear error message indicating that group.limit < 1 is the problem.

        Show
        hossman Hoss Man added a comment - I am certain that in Solr 4.10 non-cloud specifying -1 did roll up all matching documents. We certainly had more than 1 in the group. Ok .. weird. I see now I missread what Dat was saying ... in single node more getMax is returning maxDoc if docsPerGroup < 0 ... so that's where that unlimited behavior was coming from. In general this smells like a very bad idea to me ... i'm really suprised/scared group.limit=-1 has ever worked this way, because it would make it very easy to crash solr/clients depending on what how many docs are in each group being returned – this is the precise reason rows=-1 has never been supported for non-grouping searches. In any case, since we can't support the same "unlimited" behavior in cloud mode, let's absolutely add a clear error message indicating that group.limit < 1 is the problem.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        I am certain that in Solr 4.10 non-cloud specifying -1 did roll up all matching documents.

        Yep, I remember group.limit=-1 being explicitly supported when the feature was implemented.

        IIUC though, what Dat has found is that even in single node solr, group.limit=-1 doesn't give you "unlimited" groupping results – it silently re-writes the effective groupedDocsToCollect value to '1'

        Look at the code for getMax()... it does map -1 to the max possible.

        i'm really suprised/scared group.limit=-1 has ever worked this way, because it would make it very easy to crash solr/clients depending on what how many docs are in each group being returned

        There's an argument against unlimited behavior by default. But if one asks for everything back, one should get it. "-1" is a common way to ask for this... facets, the term component, and graph expressions all use "-1" as unlimited.

        The alternative is to pick a really high number out of a hat... which is more fragile since it may silently break applications in the future when they grow beyond that arbitrary number if it's not large enough (they will stop getting all the data, and that may not be an obvious error).

        Show
        yseeley@gmail.com Yonik Seeley added a comment - I am certain that in Solr 4.10 non-cloud specifying -1 did roll up all matching documents. Yep, I remember group.limit=-1 being explicitly supported when the feature was implemented. IIUC though, what Dat has found is that even in single node solr, group.limit=-1 doesn't give you "unlimited" groupping results – it silently re-writes the effective groupedDocsToCollect value to '1' Look at the code for getMax()... it does map -1 to the max possible. i'm really suprised/scared group.limit=-1 has ever worked this way, because it would make it very easy to crash solr/clients depending on what how many docs are in each group being returned There's an argument against unlimited behavior by default . But if one asks for everything back, one should get it. "-1" is a common way to ask for this... facets, the term component, and graph expressions all use "-1" as unlimited. The alternative is to pick a really high number out of a hat... which is more fragile since it may silently break applications in the future when they grow beyond that arbitrary number if it's not large enough (they will stop getting all the data, and that may not be an obvious error).
        Hide
        caomanhdat Cao Manh Dat added a comment - - edited

        Hi Yonik,

        In this case group.limit kinda more like rows parameter in /select and we don't support for negative rows param now. Users should aware that get all documents from each group will be a problem ( especially when we have very few groups ). In opposite case when we have many groups so upper bound for group.limit to retrieve all the docs will be good enough?

        Show
        caomanhdat Cao Manh Dat added a comment - - edited Hi Yonik, In this case group.limit kinda more like rows parameter in /select and we don't support for negative rows param now. Users should aware that get all documents from each group will be a problem ( especially when we have very few groups ). In opposite case when we have many groups so upper bound for group.limit to retrieve all the docs will be good enough?
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        In this case group.limit kinda more like rows parameter in /select and we don't support for negative rows param now.

        group.limit=-1 does work (and has always worked) in non-distrib mode though (by design). IIRC "rows" not supporting -1 was an oversight that I never got around to fixing. A number of people have been surprised that this doesn't just work.

        If group.limit=-1 is difficult to support in distributed mode for some reason (I don't know that part of the code), then I guess we'll have to live with different behavior in non-distrib vs distrib for now.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - In this case group.limit kinda more like rows parameter in /select and we don't support for negative rows param now. group.limit=-1 does work (and has always worked) in non-distrib mode though (by design). IIRC "rows" not supporting -1 was an oversight that I never got around to fixing. A number of people have been surprised that this doesn't just work. If group.limit=-1 is difficult to support in distributed mode for some reason (I don't know that part of the code), then I guess we'll have to live with different behavior in non-distrib vs distrib for now.
        Hide
        caomanhdat Cao Manh Dat added a comment -

        Initial patch for this issue.
        Yonik Seeley Please kindly review this patch.

        Show
        caomanhdat Cao Manh Dat added a comment - Initial patch for this issue. Yonik Seeley Please kindly review this patch.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        +1, looks good to me!

        Show
        yseeley@gmail.com Yonik Seeley added a comment - +1, looks good to me!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 7794fbd13f1a0edfff8f121fb1c6a01075eeef6a in lucene-solr's branch refs/heads/master from Yonik Seeley
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7794fbd ]

        SOLR-4164: fix group.limit=-1 in distributed mode

        Show
        jira-bot ASF subversion and git services added a comment - Commit 7794fbd13f1a0edfff8f121fb1c6a01075eeef6a in lucene-solr's branch refs/heads/master from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7794fbd ] SOLR-4164 : fix group.limit=-1 in distributed mode
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 65f9b4dc4c209dfa06aa72386f6a9bbb67a5a667 in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=65f9b4d ]

        SOLR-4164: fix group.limit=-1 in distributed mode

        Show
        jira-bot ASF subversion and git services added a comment - Commit 65f9b4dc4c209dfa06aa72386f6a9bbb67a5a667 in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=65f9b4d ] SOLR-4164 : fix group.limit=-1 in distributed mode
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        Committed. Thanks Dat!

        Show
        yseeley@gmail.com Yonik Seeley added a comment - Committed. Thanks Dat!
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Closing after 6.3.0 release.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Closing after 6.3.0 release.

          People

          • Assignee:
            yseeley@gmail.com Yonik Seeley
            Reporter:
            lancenorskog Lance Norskog
          • Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development