Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10237

Poly-Fields should work with subfield that have docValues=true

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.5, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      DocValues aren’t really supported in poly-fields at this point, but they don’t complain if the schema definition of the subfield has docValues=true. This leaves the index in an inconsistent state, since the SchemaField has docValues=true but there are no DV for the field.
      Since this breaks compatibility, I think we should just emit a warning unless the subfield is an instance of PointField. With [Int/Long/Float/Double/Date]PointField subfields, this is particularly important, since they use IndexOrDocValuesQuery, that would return incorrect results.

      1. SOLR-10237.patch
        14 kB
        Tomás Fernández Löbbe
      2. SOLR-10237.patch
        12 kB
        Tomás Fernández Löbbe
      3. SOLR-10237.patch
        6 kB
        Tomás Fernández Löbbe

        Activity

        Hide
        dsmiley David Smiley added a comment -

        I'm trying to understand the problem here. solr.PointType (an x,y generic spatial field) supports DocValues in its coordinate fields so they can be used in ValueSource/"function queries"; or am I wrong here?

        Show
        dsmiley David Smiley added a comment - I'm trying to understand the problem here. solr.PointType (an x,y generic spatial field) supports DocValues in its coordinate fields so they can be used in ValueSource/"function queries"; or am I wrong here?
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        solr.PointType (an x,y generic spatial field) supports DocValues in its coordinate fields so they can be used in ValueSource/"function queries"

        I don't believe it does. When a PointType field is created it calls SchemaField.createField(...) on it coordinates, instead of SchemaField.createField*s*(...). The first one of those only creates the "Index field", and doesn't create the DocValues field:

          @Override
          public List<IndexableField> createFields(SchemaField field, Object value, float boost) {
            String externalVal = value.toString();
            String[] point = parseCommaSeparatedList(externalVal, dimension);
        
            // TODO: this doesn't currently support polyFields as sub-field types
            List<IndexableField> f = new ArrayList<>(dimension+1);
        
            if (field.indexed()) {
              for (int i=0; i<dimension; i++) {
                SchemaField sf = subField(field, i, schema);
                f.add(sf.createField(point[i], sf.indexed() && !sf.omitNorms() ? boost : 1f));
              }
            }
        
            if (field.stored()) {
              String storedVal = externalVal;  // normalize or not?
              f.add(createField(field.getName(), storedVal, StoredField.TYPE, 1f));
            }
            
            return f;
          }
        

        Another option is to start supporting DV in the coordinates by creating the DV fields when requested, they could be used as you said when ValueSource is needed, probably inheriting the configuration (docValues=true/false) from the PointType field?

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - solr.PointType (an x,y generic spatial field) supports DocValues in its coordinate fields so they can be used in ValueSource/"function queries" I don't believe it does. When a PointType field is created it calls SchemaField.createField(...) on it coordinates, instead of SchemaField.createField* s *(...) . The first one of those only creates the "Index field", and doesn't create the DocValues field: @Override public List<IndexableField> createFields(SchemaField field, Object value, float boost) { String externalVal = value.toString(); String [] point = parseCommaSeparatedList(externalVal, dimension); // TODO: this doesn't currently support polyFields as sub-field types List<IndexableField> f = new ArrayList<>(dimension+1); if (field.indexed()) { for ( int i=0; i<dimension; i++) { SchemaField sf = subField(field, i, schema); f.add(sf.createField(point[i], sf.indexed() && !sf.omitNorms() ? boost : 1f)); } } if (field.stored()) { String storedVal = externalVal; // normalize or not? f.add(createField(field.getName(), storedVal, StoredField.TYPE, 1f)); } return f; } Another option is to start supporting DV in the coordinates by creating the DV fields when requested, they could be used as you said when ValueSource is needed, probably inheriting the configuration (docValues=true/false) from the PointType field?
        Hide
        dsmiley David Smiley added a comment -

        Oh I see; thanks for the explanation.

        Another option is to start supporting DV in the coordinates by creating the DV fields when requested, they could be used as you said when ValueSource is needed, probably inheriting the configuration (docValues=true/false) from the PointType field?

        I think this is the path forward. Doing anything else is worse – why shouldn't we let users use DocValues for purposes that DocValues were created for?

        As a related aside... I'm wondering if we may want to deprecate/remove the singular SchemaField.createField and simply have all field types do their logic in one method createFields. Perhaps further, it might accept the Lucene Document as a destination to thus avoid some needless Field[] creation. If you agree, the method might then be named addFieldsToDoc. I'm bumping into this a bit as I contemplate a TextField that can optionally put it's stored value embedded in a separate (compressed) BinaryDocValuesField – SOLR-10117.

        Show
        dsmiley David Smiley added a comment - Oh I see; thanks for the explanation. Another option is to start supporting DV in the coordinates by creating the DV fields when requested, they could be used as you said when ValueSource is needed, probably inheriting the configuration (docValues=true/false) from the PointType field? I think this is the path forward. Doing anything else is worse – why shouldn't we let users use DocValues for purposes that DocValues were created for? As a related aside... I'm wondering if we may want to deprecate/remove the singular SchemaField.createField and simply have all field types do their logic in one method createFields . Perhaps further, it might accept the Lucene Document as a destination to thus avoid some needless Field[] creation. If you agree, the method might then be named addFieldsToDoc . I'm bumping into this a bit as I contemplate a TextField that can optionally put it's stored value embedded in a separate (compressed) BinaryDocValuesField – SOLR-10117 .
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        I think this is the path forward.

        I'll upload a patch shortly

        I'm wondering if we may want to deprecate/remove the singular SchemaField.createField and simply have all field types do their logic in one method createFields

        I think that would make sense, at least the reason for which both were created separately is not being used:

          /**
           * If true, then use {@link #createFields(Object, float)}, else use {@link #createField} to save an extra allocation
           * @return true if this field is a poly field
           */
          public boolean isPolyField(){
            return type.isPolyField();
          }
        

        There are a couple of calls to createField(...) that should be replaced though, I didn't look into them in detail.

        Perhaps further, it might accept the Lucene Document as a destination to thus avoid some needless Field[] creation

        Maybe... I'm not totally sold. I think there are valid use cases for wanting to modify the returned list before adding it to the Document

        I'm bumping into this a bit as I contemplate a TextField that can optionally put it's stored value embedded in a separate (compressed) BinaryDocValuesField – SOLR-10117.

        Not sure I follow, how would this refactor help?

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - I think this is the path forward. I'll upload a patch shortly I'm wondering if we may want to deprecate/remove the singular SchemaField.createField and simply have all field types do their logic in one method createFields I think that would make sense, at least the reason for which both were created separately is not being used: /** * If true , then use {@link #createFields( Object , float )}, else use {@link #createField} to save an extra allocation * @ return true if this field is a poly field */ public boolean isPolyField(){ return type.isPolyField(); } There are a couple of calls to createField(...) that should be replaced though, I didn't look into them in detail. Perhaps further, it might accept the Lucene Document as a destination to thus avoid some needless Field[] creation Maybe... I'm not totally sold. I think there are valid use cases for wanting to modify the returned list before adding it to the Document I'm bumping into this a bit as I contemplate a TextField that can optionally put it's stored value embedded in a separate (compressed) BinaryDocValuesField – SOLR-10117 . Not sure I follow, how would this refactor help?
        Hide
        dsmiley David Smiley added a comment -

        Maybe... I'm not totally sold. I think there are valid use cases for wanting to modify the returned list before adding it to the Document.

        Remember Document is just a wrapper around an ArrayList. A caller that wanted to manipulate the list could simply use a Document instance for a transient purpose; even re-using it by calling doc.clear().

        Not sure I follow, how would this refactor help?

        It's not a necessity.

        Show
        dsmiley David Smiley added a comment - Maybe... I'm not totally sold. I think there are valid use cases for wanting to modify the returned list before adding it to the Document. Remember Document is just a wrapper around an ArrayList. A caller that wanted to manipulate the list could simply use a Document instance for a transient purpose; even re-using it by calling doc.clear(). Not sure I follow, how would this refactor help? It's not a necessity.
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        Here is the patch with DV supported. Unfortunately, due to the way poly-fields are implemented, docValues="true" can't be configured in the "field" definition, it has to be configured in the field type, or in the subfields. This should be OK and compatible with that we have now. I modified the existing UnsupportedOperationException to have a clearer message

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - Here is the patch with DV supported. Unfortunately, due to the way poly-fields are implemented, docValues="true" can't be configured in the "field" definition, it has to be configured in the field type, or in the subfields. This should be OK and compatible with that we have now. I modified the existing UnsupportedOperationException to have a clearer message
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        Remember Document is just a wrapper around an ArrayList...

        Yes, I know, but in that case the usage would be weird, something like

        Document doc = new Document();
        type.addFieldsToDoc(doc);
        doc.removeField("foo");
        

        Not too much of an issue anyway.

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - Remember Document is just a wrapper around an ArrayList... Yes, I know, but in that case the usage would be weird, something like Document doc = new Document(); type.addFieldsToDoc(doc); doc.removeField( "foo" ); Not too much of an issue anyway.
        Hide
        dsmiley David Smiley added a comment -

        +1 to the patch

        Show
        dsmiley David Smiley added a comment - +1 to the patch
        Hide
        dsmiley David Smiley added a comment -

        note: I'm committing SOLR-10286 as we speak which refactors out a FieldType.checkSupportsDocValues() method. The patch here will be affected by that and need to override that instead of checkSchemaField().

        Show
        dsmiley David Smiley added a comment - note: I'm committing SOLR-10286 as we speak which refactors out a FieldType.checkSupportsDocValues() method. The patch here will be affected by that and need to override that instead of checkSchemaField().
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        Thanks for the heads up David, I moved the DV validation to checkSupportsDocValues. Otherwise the same patch updated to master.

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - Thanks for the heads up David, I moved the DV validation to checkSupportsDocValues . Otherwise the same patch updated to master.
        Hide
        dsmiley David Smiley added a comment -

        You mentioned in an email recently:

        This is specially tricky since the users will get results, but will sometimes be incorrect

        So I'm curious how you discovered the problem here and how, without it, results may be inconsistent.

        BTW commit away when you feel it's ready.

        Show
        dsmiley David Smiley added a comment - You mentioned in an email recently: This is specially tricky since the users will get results, but will sometimes be incorrect So I'm curious how you discovered the problem here and how, without it, results may be inconsistent. BTW commit away when you feel it's ready.
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        Results will be incorrect if IndexOrDocValuesQuery decides to use the DocValues query. In my tests it would always return incorrect results because the DV query was always chosen (because there were no values in the DV field). The "sometimes" would happen in odd situations I guess, if someone has a field that has some documents with DV and some without (e.g. if we fix this in 6.6 and someone has an index built with 6.5 using point fields and upgrades without re-indexing).

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - Results will be incorrect if IndexOrDocValuesQuery decides to use the DocValues query. In my tests it would always return incorrect results because the DV query was always chosen (because there were no values in the DV field). The "sometimes" would happen in odd situations I guess, if someone has a field that has some documents with DV and some without (e.g. if we fix this in 6.6 and someone has an index built with 6.5 using point fields and upgrades without re-indexing).
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 3b660018457234387558ff626c8b95bb6f4ce853 in lucene-solr's branch refs/heads/master from Tomas Fernandez Lobbe
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3b66001 ]

        SOLR-10237: Poly-Fields should work with subfield that have docValues=true

        Show
        jira-bot ASF subversion and git services added a comment - Commit 3b660018457234387558ff626c8b95bb6f4ce853 in lucene-solr's branch refs/heads/master from Tomas Fernandez Lobbe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3b66001 ] SOLR-10237 : Poly-Fields should work with subfield that have docValues=true
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit f947cbc461c961007b268d701e6b415f7882aa03 in lucene-solr's branch refs/heads/branch_6x from Tomas Fernandez Lobbe
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f947cbc ]

        SOLR-10237: Poly-Fields should work with subfield that have docValues=true

        Show
        jira-bot ASF subversion and git services added a comment - Commit f947cbc461c961007b268d701e6b415f7882aa03 in lucene-solr's branch refs/heads/branch_6x from Tomas Fernandez Lobbe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f947cbc ] SOLR-10237 : Poly-Fields should work with subfield that have docValues=true
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 3f6a15fd797081a6f6383a623ae2a794bcd6f800 in lucene-solr's branch refs/heads/branch_6_5 from Tomas Fernandez Lobbe
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3f6a15f ]

        SOLR-10237: Poly-Fields should work with subfield that have docValues=true

        Show
        jira-bot ASF subversion and git services added a comment - Commit 3f6a15fd797081a6f6383a623ae2a794bcd6f800 in lucene-solr's branch refs/heads/branch_6_5 from Tomas Fernandez Lobbe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3f6a15f ] SOLR-10237 : Poly-Fields should work with subfield that have docValues=true

          People

          • Assignee:
            tomasflobbe Tomás Fernández Löbbe
            Reporter:
            tomasflobbe Tomás Fernández Löbbe
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development