Solr
  1. Solr
  2. SOLR-6452

StatsComponent "missing" stat won't work with docValues=true and indexed=false

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.10, 6.0
    • Fix Version/s: 4.10.2, 5.0, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      StatsComponent can work with DocValues, but it still required to use indexed=true for the "missing" stat to work. Missing values should be obtained from the docValues too.

      1. SOLR-6452.patch
        9 kB
        Vitaliy Zhovtyuk
      2. SOLR-6452.patch
        19 kB
        Tomás Fernández Löbbe
      3. SOLR-6452.patch
        18 kB
        Xu Zhang
      4. SOLR-6452-trunk.patch
        19 kB
        Vitaliy Zhovtyuk
      5. SOLR-6452-trunk.patch
        18 kB
        Xu Zhang
      6. SOLR-6452-trunk.patch
        27 kB
        Xu Zhang

        Issue Links

          Activity

          Hide
          Xu Zhang added a comment -

          Looks like this bug is only for multi-valued field.

          Show
          Xu Zhang added a comment - Looks like this bug is only for multi-valued field.
          Hide
          Xu Zhang added a comment -

          Upload my first fix for "missing" stat.
          Still working on it and need some suggestions.

          My first patch

          Show
          Xu Zhang added a comment - Upload my first fix for "missing" stat. Still working on it and need some suggestions. My first patch
          Hide
          Xu Zhang added a comment -

          I talked with Tomas, new to fix the bug based on Tomas's comment.

          Right now, for multi-value stats field, we only test stats.facet on String fieldtype. I add another simple test case testing stats.facet on int type. Looks like this part is still buggy.

          Show
          Xu Zhang added a comment - I talked with Tomas, new to fix the bug based on Tomas's comment. Right now, for multi-value stats field, we only test stats.facet on String fieldtype. I add another simple test case testing stats.facet on int type. Looks like this part is still buggy.
          Hide
          Tomás Fernández Löbbe added a comment - - edited

          Thanks for the patch Xu Zhang. As an optimization, why don't you make the missingStats a Map<Integer, Integer> and use the ords as keys instead of the terms. That way you don't need to do the lookupOrd for all docs, and you do it only once per term in the accumulateMissing() method.

          Show
          Tomás Fernández Löbbe added a comment - - edited Thanks for the patch Xu Zhang . As an optimization, why don't you make the missingStats a Map<Integer, Integer> and use the ords as keys instead of the terms. That way you don't need to do the lookupOrd for all docs, and you do it only once per term in the accumulateMissing() method.
          Hide
          Xu Zhang added a comment -

          Update based on the comments.

          Thanks Tomas

          Show
          Xu Zhang added a comment - Update based on the comments. Thanks Tomas
          Hide
          Tomás Fernández Löbbe added a comment -

          I think the patch looks good, I'll commit it shortly

          Show
          Tomás Fernández Löbbe added a comment - I think the patch looks good, I'll commit it shortly
          Hide
          Tomás Fernández Löbbe added a comment -

          New patch against trunk plus minor changes

          Show
          Tomás Fernández Löbbe added a comment - New patch against trunk plus minor changes
          Hide
          ASF subversion and git services added a comment -

          Commit 1624091 from Tomás Fernández Löbbe in branch 'dev/trunk'
          [ https://svn.apache.org/r1624091 ]

          SOLR-6452: StatsComponent's stat 'missing' will work on fields with docValues=true and indexed=false

          Show
          ASF subversion and git services added a comment - Commit 1624091 from Tomás Fernández Löbbe in branch 'dev/trunk' [ https://svn.apache.org/r1624091 ] SOLR-6452 : StatsComponent's stat 'missing' will work on fields with docValues=true and indexed=false
          Hide
          ASF subversion and git services added a comment -

          Commit 1624117 from Tomás Fernández Löbbe in branch 'dev/trunk'
          [ https://svn.apache.org/r1624117 ]

          SOLR-6452: CHANGES.txt entry

          Show
          ASF subversion and git services added a comment - Commit 1624117 from Tomás Fernández Löbbe in branch 'dev/trunk' [ https://svn.apache.org/r1624117 ] SOLR-6452 : CHANGES.txt entry
          Hide
          Vitaliy Zhovtyuk added a comment -

          There are few issues in committed code addressed in attached patch:
          1. Accumulate methods should not return stats specific numbers (it is generic calculation).
          Attached solution with nested container class.
          Also made them private scoped.

          2. Reduced visibility of fields in FieldFacetStats.
          Created getter to expose FieldFacetStats.facetStatsValues.

          3. Methods FieldFacetStats#accumulateMissing and FieldFacetStats#accumulateTermNum does not throw any IO exception

          4. Why missing facet counters cant work on StatsValues directly without intermediate maps. They have all required required methods that look like
          duplicated in org.apache.solr.handler.component.FieldFacetStats#facetMissingNum and org.apache.solr.handler.component.AbstractStatsValues#missing?
          Will try to unite it.

          Show
          Vitaliy Zhovtyuk added a comment - There are few issues in committed code addressed in attached patch: 1. Accumulate methods should not return stats specific numbers (it is generic calculation). Attached solution with nested container class. Also made them private scoped. 2. Reduced visibility of fields in FieldFacetStats. Created getter to expose FieldFacetStats.facetStatsValues. 3. Methods FieldFacetStats#accumulateMissing and FieldFacetStats#accumulateTermNum does not throw any IO exception 4. Why missing facet counters cant work on StatsValues directly without intermediate maps. They have all required required methods that look like duplicated in org.apache.solr.handler.component.FieldFacetStats#facetMissingNum and org.apache.solr.handler.component.AbstractStatsValues#missing? Will try to unite it.
          Hide
          Tomás Fernández Löbbe added a comment -

          1. Accumulate methods should not return stats specific numbers (it is generic calculation).

          I don't see the issue here. Can you provide a test that exposes the bug?

          2. Reduced visibility of fields in FieldFacetStats.

          I think this should be done in trunk only

          Show
          Tomás Fernández Löbbe added a comment - 1. Accumulate methods should not return stats specific numbers (it is generic calculation). I don't see the issue here. Can you provide a test that exposes the bug? 2. Reduced visibility of fields in FieldFacetStats. I think this should be done in trunk only
          Hide
          Tomás Fernández Löbbe added a comment -

          Vitaliy Zhovtyuk Do you have that test? or a pointer to what it would fail? Otherwise I'm going to merge the code to branch_4x and mark this Jira as resolved. I liked the code refactor you did, but the way I see it that's not necessary to fix this bug.

          Show
          Tomás Fernández Löbbe added a comment - Vitaliy Zhovtyuk Do you have that test? or a pointer to what it would fail? Otherwise I'm going to merge the code to branch_4x and mark this Jira as resolved. I liked the code refactor you did, but the way I see it that's not necessary to fix this bug.
          Hide
          ASF subversion and git services added a comment -

          Commit 1624406 from Tomás Fernández Löbbe in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1624406 ]

          SOLR-6452: StatsComponent's stat 'missing' will work on fields with docValues=true and indexed=false

          Show
          ASF subversion and git services added a comment - Commit 1624406 from Tomás Fernández Löbbe in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1624406 ] SOLR-6452 : StatsComponent's stat 'missing' will work on fields with docValues=true and indexed=false
          Hide
          Vitaliy Zhovtyuk added a comment -

          Added patch based on trunk.
          Added facet fields tests for integer and double types, and facet value stats is not available. Does it require new issue or reopen SOLR-6452?

          Implementation of this issue can be improved in few cases:
          1. Accumulate methods should not return stats specific numbers (it is generic). Attached solution with container class. Also made them private scoped.
          Returning just missing fields from accumulate methods does not allow you to extend it with additional counts field, therefore i propose to leave void.

          2. Reduced visibility of fields in FieldFacetStats.
          Created getter to expose FieldFacetStats.facetStatsValues.

          3. Methods FieldFacetStats#accumulateMissing and FieldFacetStats#accumulateTermNum does not throw any IO exception

          4. We don't need intermediate maps to accumulate missing counts. Changed org.apache.solr.handler.component.FieldFacetStats#facetMissingNum
          to work directly on StatsValues structure and removed org.apache.solr.handler.component.FieldFacetStats#accumulateMissing.
          We don't need to have it in 2 phases.

          Show
          Vitaliy Zhovtyuk added a comment - Added patch based on trunk. Added facet fields tests for integer and double types, and facet value stats is not available. Does it require new issue or reopen SOLR-6452 ? Implementation of this issue can be improved in few cases: 1. Accumulate methods should not return stats specific numbers (it is generic). Attached solution with container class. Also made them private scoped. Returning just missing fields from accumulate methods does not allow you to extend it with additional counts field, therefore i propose to leave void. 2. Reduced visibility of fields in FieldFacetStats. Created getter to expose FieldFacetStats.facetStatsValues. 3. Methods FieldFacetStats#accumulateMissing and FieldFacetStats#accumulateTermNum does not throw any IO exception 4. We don't need intermediate maps to accumulate missing counts. Changed org.apache.solr.handler.component.FieldFacetStats#facetMissingNum to work directly on StatsValues structure and removed org.apache.solr.handler.component.FieldFacetStats#accumulateMissing. We don't need to have it in 2 phases.
          Hide
          Xu Zhang added a comment -

          This is a known issue, facet value stats only works for string type right now, tracked in: SOLR-1782

          This Jira is only for "missing" stat.

          Show
          Xu Zhang added a comment - This is a known issue, facet value stats only works for string type right now, tracked in: SOLR-1782 This Jira is only for "missing" stat.
          Hide
          Tomás Fernández Löbbe added a comment -

          I think the fix for this issue was correct. Vitaliy Zhovtyuk, please open a new Jira for your suggested refactor.

          Show
          Tomás Fernández Löbbe added a comment - I think the fix for this issue was correct. Vitaliy Zhovtyuk , please open a new Jira for your suggested refactor.
          Hide
          ASF subversion and git services added a comment -

          Commit 1633741 from Tomás Fernández Löbbe in branch 'dev/branches/lucene_solr_4_10'
          [ https://svn.apache.org/r1633741 ]

          SOLR-6452: StatsComponent's stat 'missing' will work on fields with docValues=true and indexed=false

          Show
          ASF subversion and git services added a comment - Commit 1633741 from Tomás Fernández Löbbe in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1633741 ] SOLR-6452 : StatsComponent's stat 'missing' will work on fields with docValues=true and indexed=false

            People

            • Assignee:
              Tomás Fernández Löbbe
              Reporter:
              Tomás Fernández Löbbe
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development