Solr
  1. Solr
  2. SOLR-1380

Extend StatsComponent to MultiValued Fields

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: search
    • Labels:
      None

      Description

      The StatsComponent does not work on multivalued fields.

      1. SOLR-1380.patch
        4 kB
        Harish Agarwal
      2. SOLR-1380.patch
        27 kB
        Harish Agarwal
      3. SOLR-1380.patch
        18 kB
        Harish Agarwal
      4. SOLR-1380.patch
        18 kB
        Harish Agarwal
      5. SOLR-1380.patch
        18 kB
        Harish Agarwal

        Activity

        Hide
        Harish Agarwal added a comment -

        Patch to extend StatsComponent to multivalued fields. Please review, suggest, criticize!

        Show
        Harish Agarwal added a comment - Patch to extend StatsComponent to multivalued fields. Please review, suggest, criticize!
        Hide
        Harish Agarwal added a comment -

        Fix to patch to return null when no stats were calculated.

        Show
        Harish Agarwal added a comment - Fix to patch to return null when no stats were calculated.
        Hide
        Harish Agarwal added a comment -

        Another fix to StatsComponent.

        Show
        Harish Agarwal added a comment - Another fix to StatsComponent.
        Hide
        Grant Ingersoll added a comment -

        This looks pretty good in a first pass. I think it can go in. Not totally thrilled about the UninvertedField near copy/paste, but don't see a good alternative at the moment.

        Show
        Grant Ingersoll added a comment - This looks pretty good in a first pass. I think it can go in. Not totally thrilled about the UninvertedField near copy/paste, but don't see a good alternative at the moment.
        Hide
        Harish Agarwal added a comment -

        Grant - thanks for looking it over. Let me know if I can help with incorporating it into the trunk (don't have commit access, but if issues come up, I can help debug).

        I def. agree that the near copy/paste is not a great solution. In my opinion, the Stats and Facet components should be integrated, this would provide an easy way to remove that and other redundancies, as well as allow for added functionality. What do you think?

        Show
        Harish Agarwal added a comment - Grant - thanks for looking it over. Let me know if I can help with incorporating it into the trunk (don't have commit access, but if issues come up, I can help debug). I def. agree that the near copy/paste is not a great solution. In my opinion, the Stats and Facet components should be integrated, this would provide an easy way to remove that and other redundancies, as well as allow for added functionality. What do you think?
        Hide
        Grant Ingersoll added a comment -

        In looking a second time, what about the faceting support?

        I think the test needs to pass something like (I didn't fill in the std. dev):

        public void testMVFieldStatisticsResult() throws Exception {
            SolrCore core = h.getCore();
        
            assertU(adoc("id", "1", "stats_ii", "-10", "stats_ii", "-100", "active_s", "true"));
            assertU(adoc("id", "2", "stats_ii", "-20", "stats_ii", "200", "active_s", "true"));
            assertU(adoc("id", "3", "stats_ii", "-30", "stats_ii", "-1", "active_s", "false"));
            assertU(adoc("id", "4", "stats_ii", "-40", "stats_ii", "10", "active_s", "false"));
            assertU(commit());
        
            Map<String, String> args = new HashMap<String, String>();
            args.put(CommonParams.Q, "*:*");
            args.put(StatsParams.STATS, "true");
            args.put(StatsParams.STATS_FIELD, "stats_ii");
            args.put(StatsParams.STATS_FACET, "active_s");
            args.put("indent", "true");
            SolrQueryRequest req = new LocalSolrQueryRequest(core, new MapSolrParams(args));
        
            assertQ("test statistics values", req
                    , "//double[@name='min'][.='-100.0']"
                    , "//double[@name='max'][.='200.0']"
                    , "//double[@name='sum'][.='9.0']"
                    , "//long[@name='count'][.='8']"
                    , "//long[@name='missing'][.='0']"
                    , "//double[@name='sumOfSquares'][.='53101.0']"
                    , "//double[@name='mean'][.='1.125']"
                    , "//double[@name='stddev'][.='87.08852228787508']"
            );
        
            assertQ("test value for active_s=true", req
                    , "//lst[@name='true']/double[@name='min'][.='-100.0']"
                    , "//lst[@name='true']/double[@name='max'][.='200.0']"
                    , "//lst[@name='true']/double[@name='sum'][.='130.0']"
                    , "//lst[@name='true']/long[@name='count'][.='2']"
                    , "//lst[@name='true']/long[@name='missing'][.='0']"
                    , "//lst[@name='true']/double[@name='sumOfSquares'][.='50500.0']"
                    , "//lst[@name='true']/double[@name='mean'][.='32.5']"
                    , "//lst[@name='true']/double[@name='stddev'][.='']"
            );
          }
        

        If you can turn this around pretty quick, I think we can still add it, otherwise I'm going to mark as 1.5

        Show
        Grant Ingersoll added a comment - In looking a second time, what about the faceting support? I think the test needs to pass something like (I didn't fill in the std. dev): public void testMVFieldStatisticsResult() throws Exception { SolrCore core = h.getCore(); assertU(adoc("id", "1", "stats_ii", "-10", "stats_ii", "-100", "active_s", "true")); assertU(adoc("id", "2", "stats_ii", "-20", "stats_ii", "200", "active_s", "true")); assertU(adoc("id", "3", "stats_ii", "-30", "stats_ii", "-1", "active_s", "false")); assertU(adoc("id", "4", "stats_ii", "-40", "stats_ii", "10", "active_s", "false")); assertU(commit()); Map<String, String> args = new HashMap<String, String>(); args.put(CommonParams.Q, "*:*"); args.put(StatsParams.STATS, "true"); args.put(StatsParams.STATS_FIELD, "stats_ii"); args.put(StatsParams.STATS_FACET, "active_s"); args.put("indent", "true"); SolrQueryRequest req = new LocalSolrQueryRequest(core, new MapSolrParams(args)); assertQ("test statistics values", req , "//double[@name='min'][.='-100.0']" , "//double[@name='max'][.='200.0']" , "//double[@name='sum'][.='9.0']" , "//long[@name='count'][.='8']" , "//long[@name='missing'][.='0']" , "//double[@name='sumOfSquares'][.='53101.0']" , "//double[@name='mean'][.='1.125']" , "//double[@name='stddev'][.='87.08852228787508']" ); assertQ("test value for active_s=true", req , "//lst[@name='true']/double[@name='min'][.='-100.0']" , "//lst[@name='true']/double[@name='max'][.='200.0']" , "//lst[@name='true']/double[@name='sum'][.='130.0']" , "//lst[@name='true']/long[@name='count'][.='2']" , "//lst[@name='true']/long[@name='missing'][.='0']" , "//lst[@name='true']/double[@name='sumOfSquares'][.='50500.0']" , "//lst[@name='true']/double[@name='mean'][.='32.5']" , "//lst[@name='true']/double[@name='stddev'][.='']" ); } If you can turn this around pretty quick, I think we can still add it, otherwise I'm going to mark as 1.5
        Hide
        Harish Agarwal added a comment -

        I'll try to get it done in the next day or two - hopefully that still leaves time for it to be added.

        Show
        Harish Agarwal added a comment - I'll try to get it done in the next day or two - hopefully that still leaves time for it to be added.
        Hide
        Harish Agarwal added a comment -

        Updated patch to allow for multivalued stats collection on facet fields.

        Please review, comment and criticize. Grant, let me know if I can do anymore work on this to help get it incorporated into the trunk.

        Show
        Harish Agarwal added a comment - Updated patch to allow for multivalued stats collection on facet fields. Please review, comment and criticize. Grant, let me know if I can do anymore work on this to help get it incorporated into the trunk.
        Hide
        Grant Ingersoll added a comment -

        Committed revision 813874.

        Thanks Harish!

        Show
        Grant Ingersoll added a comment - Committed revision 813874. Thanks Harish!
        Hide
        Harish Agarwal added a comment -

        Noticed that distributed search was crashing for me in the StatsComponent. I believe this is related to the fuzziness around when the returned stats values should be null for a field. I'm submitting a small patch which fixes the bug, and always returns null for stats values for which the count is equal to zero in both distributed and non-distributed search. My rationale behind this is that when the count is equal to zero, the min/max/mean values reported are garbage and should not be relied on as valid statistics for the field. Hopefully this can still be incorporated into the trunk before the 1.4 release.

        Show
        Harish Agarwal added a comment - Noticed that distributed search was crashing for me in the StatsComponent. I believe this is related to the fuzziness around when the returned stats values should be null for a field. I'm submitting a small patch which fixes the bug, and always returns null for stats values for which the count is equal to zero in both distributed and non-distributed search. My rationale behind this is that when the count is equal to zero, the min/max/mean values reported are garbage and should not be relied on as valid statistics for the field. Hopefully this can still be incorporated into the trunk before the 1.4 release.
        Hide
        Harish Agarwal added a comment -

        Fix to distributed search, null stats.

        Show
        Harish Agarwal added a comment - Fix to distributed search, null stats.
        Hide
        Grant Ingersoll added a comment -

        Committed revision 814042.

        Show
        Grant Ingersoll added a comment - Committed revision 814042.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for Solr 1.4

        Show
        Grant Ingersoll added a comment - Bulk close for Solr 1.4

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Harish Agarwal
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development