Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.5.1, 6.6, 7.0
    • Component/s: None
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      (NOTE: description below focuses on IntPointField, but problem seems to affect all PointField subclasses)

      There seems to be a disconnect between PointField.createFields -> IntPointField.createField -> PointField.isFieldUsed that results in an org.apache.lucene.document.IntPoint being created for each field value, even if field is indexed="false"

      Steps to reproduce...

      bin/solr -e techproducts
      ...
      curl -X POST -H 'Content-type:application/json' --data-binary '{
        "add-field":{
           "name":"hoss_points_check",
           "type":"pint",
           "stored":true,
           "docValues":false,
           "indexed":false}
      }' http://localhost:8983/solr/techproducts/schema
      ...
      curl -X POST -H 'Content-type:application/json' --data-binary '[{"id":"HOSS","hoss_points_check":42}]' 'http://localhost:8983/solr/techproducts/update/json?commit=true'
      ...
      curl 'http://localhost:8983/solr/techproducts/query?q=id:HOSS'
      {
        "responseHeader":{
          "status":0,
          "QTime":3,
          "params":{
            "q":"id:HOSS"}},
        "response":{"numFound":1,"start":0,"docs":[
            {
              "id":"HOSS",
              "hoss_points_check":42,
              "_version_":1563795876337418240}]
        }}
      curl 'http://localhost:8983/solr/techproducts/query?q=hoss_points_check:42'
      {
        "responseHeader":{
          "status":0,
          "QTime":2,
          "params":{
            "q":"hoss_points_check:42"}},
        "response":{"numFound":1,"start":0,"docs":[
            {
              "id":"HOSS",
              "hoss_points_check":42,
              "_version_":1563795876337418240}]
        }}
      

      Note that I can search on the doc using the "hoss_points_check" field even though it is docValues="false" indexed="false"

      1. SOLR-10425_hoss.patch
        26 kB
        Hoss Man
      2. SOLR-10425.patch
        57 kB
        Hoss Man
      3. SOLR-10425.patch
        51 kB
        Tomás Fernández Löbbe
      4. SOLR-10425.patch
        45 kB
        Tomás Fernández Löbbe
      5. SOLR-10425.patch
        26 kB
        Tomás Fernández Löbbe
      6. SOLR-10425.patch
        16 kB
        Tomás Fernández Löbbe

        Activity

        Hide
        hossman Hoss Man added a comment -

        At first glance, I thought maybe this may have been an intentional design choice that tomas made with the mindset of:

        • users should not declare a "Point" field unless they definitely wanted a BKD Point structure on disk
        • the "indexed" should only apply to creating the Terms based inverted index files, not the BKD points files

        But this doesn't jive with the existing test configs (such as schema-point.xml) where many point fields explicitly use indexed="true"

        It's also really important to ensure we have some way to support users who want an "integer" type field that is stored="true" but don't care about it being searchable/sortable w/o wasting disk space – even once/if we deprecate/remove TrieIntField}


        I believe the fix here is a pretty straight forward: IntPointField.createField should ignore isFieldUsed() and just check field.indexed() – allthough some other tweaks/null-checks may be needed in PointField.createFields & obviously we should get a lot more tests of these edge cases to smoke those out.

        i'm hoping Tomás Fernández Löbbe will chime in with a sanity check that i'm not missing some major reason why things work this way before i go too deep down hte rabit hole of writting new tests.

        Show
        hossman Hoss Man added a comment - At first glance, I thought maybe this may have been an intentional design choice that tomas made with the mindset of: users should not declare a "Point" field unless they definitely wanted a BKD Point structure on disk the "indexed" should only apply to creating the Terms based inverted index files, not the BKD points files But this doesn't jive with the existing test configs (such as schema-point.xml ) where many point fields explicitly use indexed="true" It's also really important to ensure we have some way to support users who want an "integer" type field that is stored="true" but don't care about it being searchable/sortable w/o wasting disk space – even once/if we deprecate/remove TrieIntField } I believe the fix here is a pretty straight forward: IntPointField.createField should ignore isFieldUsed() and just check field.indexed() – allthough some other tweaks/null-checks may be needed in PointField.createFields & obviously we should get a lot more tests of these edge cases to smoke those out. i'm hoping Tomás Fernández Löbbe will chime in with a sanity check that i'm not missing some major reason why things work this way before i go too deep down hte rabit hole of writting new tests.
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        This is not intentional. My intention with PointFields was always to make them look exactly the same as TrieFields to the end user, so features should work the same way. This needs to be fixed.
        What about moving the isFieldUsed() check to createFields(...), and then just checking if indexed=true before calling createField(...)? That way the fix just goes to the superclass, and createField always creates?

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - This is not intentional. My intention with PointFields was always to make them look exactly the same as TrieFields to the end user, so features should work the same way. This needs to be fixed. What about moving the isFieldUsed() check to createFields(...) , and then just checking if indexed=true before calling createField(...) ? That way the fix just goes to the superclass, and createField always creates?
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        This is still WIP. Here is an attempt for fixing this issue and some tests. The new tests pass now after the changes in PointField, but other tests fail because we try to use PointInSet query for DV-only fields.

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - This is still WIP. Here is an attempt for fixing this issue and some tests. The new tests pass now after the changes in PointField, but other tests fail because we try to use PointInSet query for DV-only fields.
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        This patch fixes the issue with the set query

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - This patch fixes the issue with the set query
        Hide
        hossman Hoss Man added a comment -

        tomas: per our IRC conversation this morning, here's the (test only) patch i've been working on.

        there's still some nocommits around beefing up the multivalued exact query testing that i'll working on after i grab some lunch, but the 2 other things i wanted to draw your attention to (which we probably want to split out into new jiras?)...

        • the test schema explicitly declares many fields as useDocValuesAsStored="true" but that's actually the default based on the schema version=1.6 – making me suspicious of tests that are depending on that, or tests assuming that other fields don't have that.
        • when i tried to write a testNonReturnable it fails for fields that have docvalues – even using new fields that i explicitly put useDocValuesAsStored=false on.
        Show
        hossman Hoss Man added a comment - tomas: per our IRC conversation this morning, here's the (test only) patch i've been working on. there's still some nocommits around beefing up the multivalued exact query testing that i'll working on after i grab some lunch, but the 2 other things i wanted to draw your attention to (which we probably want to split out into new jiras?)... the test schema explicitly declares many fields as useDocValuesAsStored="true" but that's actually the default based on the schema version=1.6 – making me suspicious of tests that are depending on that, or tests assuming that other fields don't have that. when i tried to write a testNonReturnable it fails for fields that have docvalues – even using new fields that i explicitly put useDocValuesAsStored=false on.
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        thanks Hoss.

        the test schema explicitly declares many fields as useDocValuesAsStored="true" but that's actually the default based on the schema version=1.6 – making me suspicious of tests that are depending on that, or tests assuming that other fields don't have that.

        Good point. Maybe we can explicitly set useDocValuesAsStored="true" in other fields?

        when i tried to write a testNonReturnable it fails for fields that have docvalues – even using new fields that i explicitly put useDocValuesAsStored=false on.

        I'll take a look. In the testInternals method I just added I included this code:

        for (LeafReaderContext leave : ir.leaves()) {
                LeafReader reader = leave.reader();
                for (int i = 0; i < reader.numDocs(); i++) {
                  Document doc = reader.document(i, Collections.singleton(field));
                  if (sf.stored()) {
                    assertNotNull(doc.get(field));
                  } else {
                    assertNull(doc.get(field));
                  }
                }
              }
        

        That passes the tests...

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - thanks Hoss. the test schema explicitly declares many fields as useDocValuesAsStored="true" but that's actually the default based on the schema version=1.6 – making me suspicious of tests that are depending on that, or tests assuming that other fields don't have that. Good point. Maybe we can explicitly set useDocValuesAsStored="true" in other fields? when i tried to write a testNonReturnable it fails for fields that have docvalues – even using new fields that i explicitly put useDocValuesAsStored=false on. I'll take a look. In the testInternals method I just added I included this code: for (LeafReaderContext leave : ir.leaves()) { LeafReader reader = leave.reader(); for ( int i = 0; i < reader.numDocs(); i++) { Document doc = reader.document(i, Collections.singleton(field)); if (sf.stored()) { assertNotNull(doc.get(field)); } else { assertNull(doc.get(field)); } } } That passes the tests...
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        Merged patch

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - Merged patch
        Hide
        hossman Hoss Man added a comment -

        tomas: i haven't had a change to look at your latest patch, and i have to go offline for the rest of the day, but FWIW here's the whitebox test i was working on...

          public void testWhiteboxCreateFields() throws Exception {
            // TODO: add a "coverage" check that all points based dynamic fields in the schema are accounted for?
            
            doWhiteboxCreateFields("whitebox_p_i_dv", IntPoint.class, 42, "42");
            doWhiteboxCreateFields("whitebox_p_i_mv", IntPoint.class, 42, "42");
            
            doWhiteboxCreateFields("whitebox_p_i_ni", IntPoint.class, 42, "42");
        
            // nocommit: TEST ALL THE FIELDS
        
            
          }
          
          /** nocommit: jdocs, @see callAndCheckCreateFields, verifies same result for all values (string or type specific)
           */
          private void doWhiteboxCreateFields(final String fieldName, final Class pointType, final Object... values) throws Exception {
            List<IndexableField> firstResult = null;
            for (Object value : values) {
              List<IndexableField> result = callAndCheckCreateFields(fieldName, pointType, value);
              assertNotNull(value + " => null", result);
              if (null == firstResult) {
                firstResult = result;
              } else {
                // nocommit: why doesn't this check pass? are some IndexableFields not implementing equals?
                // assertEquals(firstResult, result);
              }
            }
          }
        
          /** nocommit: jdocs, returns the result of createFields after doing some sanity checks */
          private List<IndexableField> callAndCheckCreateFields(final String fieldName, final Class pointType, final Object value) throws Exception {
        
            
            final SchemaField sf = h.getCore().getLatestSchema().getField(fieldName);
        
            final List<IndexableField> results = sf.createFields(value);
            final Set<IndexableField> resultSet = new LinkedHashSet<>(results);
            assertEquals("duplicates found in results? " + results.toString(),
                         results.size(), resultSet.size());
        
            final Set<Class> resultClasses = new HashSet<>();
            for (IndexableField f : results) {
              resultClasses.add(f.getClass());
              
              if ( ! sf.hasDocValues() ) {
                assertFalse(f.toString(),
                            (f instanceof NumericDocValuesField) ||
                            (f instanceof SortedNumericDocValuesField));
              }
            }
        
            assertEquals("stored", sf.stored(), resultClasses.contains(StoredField.class));
            assertEquals("indexed", sf.indexed(), resultClasses.contains(pointType));
            if (sf.multiValued()) {
              assertEquals("docvalues", sf.hasDocValues(), resultClasses.contains(SortedNumericDocValuesField.class));
            } else {
              assertEquals("docvalues", sf.hasDocValues(), resultClasses.contains(NumericDocValuesField.class));
            }
        
            return results;
          }
        
        
        Show
        hossman Hoss Man added a comment - tomas: i haven't had a change to look at your latest patch, and i have to go offline for the rest of the day, but FWIW here's the whitebox test i was working on... public void testWhiteboxCreateFields() throws Exception { // TODO: add a "coverage" check that all points based dynamic fields in the schema are accounted for ? doWhiteboxCreateFields( "whitebox_p_i_dv" , IntPoint.class, 42, "42" ); doWhiteboxCreateFields( "whitebox_p_i_mv" , IntPoint.class, 42, "42" ); doWhiteboxCreateFields( "whitebox_p_i_ni" , IntPoint.class, 42, "42" ); // nocommit: TEST ALL THE FIELDS } /** nocommit: jdocs, @see callAndCheckCreateFields, verifies same result for all values (string or type specific) */ private void doWhiteboxCreateFields( final String fieldName, final Class pointType, final Object ... values) throws Exception { List<IndexableField> firstResult = null ; for ( Object value : values) { List<IndexableField> result = callAndCheckCreateFields(fieldName, pointType, value); assertNotNull(value + " => null " , result); if ( null == firstResult) { firstResult = result; } else { // nocommit: why doesn't this check pass? are some IndexableFields not implementing equals? // assertEquals(firstResult, result); } } } /** nocommit: jdocs, returns the result of createFields after doing some sanity checks */ private List<IndexableField> callAndCheckCreateFields( final String fieldName, final Class pointType, final Object value) throws Exception { final SchemaField sf = h.getCore().getLatestSchema().getField(fieldName); final List<IndexableField> results = sf.createFields(value); final Set<IndexableField> resultSet = new LinkedHashSet<>(results); assertEquals( "duplicates found in results? " + results.toString(), results.size(), resultSet.size()); final Set< Class > resultClasses = new HashSet<>(); for (IndexableField f : results) { resultClasses.add(f.getClass()); if ( ! sf.hasDocValues() ) { assertFalse(f.toString(), (f instanceof NumericDocValuesField) || (f instanceof SortedNumericDocValuesField)); } } assertEquals( "stored" , sf.stored(), resultClasses.contains(StoredField.class)); assertEquals( "indexed" , sf.indexed(), resultClasses.contains(pointType)); if (sf.multiValued()) { assertEquals( "docvalues" , sf.hasDocValues(), resultClasses.contains(SortedNumericDocValuesField.class)); } else { assertEquals( "docvalues" , sf.hasDocValues(), resultClasses.contains(NumericDocValuesField.class)); } return results; }
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        Added testWhiteboxCreateFields to the patch. Testing all the point dynamic fields

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - Added testWhiteboxCreateFields to the patch. Testing all the point dynamic fields
        Hide
        hossman Hoss Man added a comment -

        Updated patch...

        • testPointFieldMultiValuedExactQuery now includes a "searchable" boolean and has callers that pass false
        • cleaned up nocommits and related dead (test) code for ideas i had that didn't pan out
        • cleaned up the test javadocs
        • spun off SOLR-10437 & SOLR-10438 as distinct issues and cited them as appropriate in comments/@AwaitsFix
        Show
        hossman Hoss Man added a comment - Updated patch... testPointFieldMultiValuedExactQuery now includes a "searchable" boolean and has callers that pass false cleaned up nocommits and related dead (test) code for ideas i had that didn't pan out cleaned up the test javadocs spun off SOLR-10437 & SOLR-10438 as distinct issues and cited them as appropriate in comments/@AwaitsFix
        Hide
        hossman Hoss Man added a comment -

        I'm running precommit & full test suite now ... i'll commit & backport this afternoon unless tomas beats me to it.

        Show
        hossman Hoss Man added a comment - I'm running precommit & full test suite now ... i'll commit & backport this afternoon unless tomas beats me to it.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 6e5f6fab53b8c6f4acbebd51c346173829a3247a in lucene-solr's branch refs/heads/master from Chris Hostetter
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6e5f6fa ]

        SOLR-10425: Fix indexed="false" on numeric PointFields

        Show
        jira-bot ASF subversion and git services added a comment - Commit 6e5f6fab53b8c6f4acbebd51c346173829a3247a in lucene-solr's branch refs/heads/master from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6e5f6fa ] SOLR-10425 : Fix indexed="false" on numeric PointFields
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ac2c2adaeb98fb7cd5048be4d16785ed502ece14 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ac2c2ad ]

        SOLR-10425: Fix indexed="false" on numeric PointFields

        (cherry picked from commit 6e5f6fab53b8c6f4acbebd51c346173829a3247a)

        Resolved Conflicts:
        1) 6x still has boost support in createFields
        2) 6x doesn't have the an iterator based API for DocValues (used in tests)

        solr/core/src/java/org/apache/solr/schema/DatePointField.java
        solr/core/src/java/org/apache/solr/schema/DoublePointField.java
        solr/core/src/java/org/apache/solr/schema/FloatPointField.java
        solr/core/src/java/org/apache/solr/schema/IntPointField.java
        solr/core/src/java/org/apache/solr/schema/LongPointField.java
        solr/core/src/java/org/apache/solr/schema/PointField.java

        Show
        jira-bot ASF subversion and git services added a comment - Commit ac2c2adaeb98fb7cd5048be4d16785ed502ece14 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ac2c2ad ] SOLR-10425 : Fix indexed="false" on numeric PointFields (cherry picked from commit 6e5f6fab53b8c6f4acbebd51c346173829a3247a) Resolved Conflicts: 1) 6x still has boost support in createFields 2) 6x doesn't have the an iterator based API for DocValues (used in tests) solr/core/src/java/org/apache/solr/schema/DatePointField.java solr/core/src/java/org/apache/solr/schema/DoublePointField.java solr/core/src/java/org/apache/solr/schema/FloatPointField.java solr/core/src/java/org/apache/solr/schema/IntPointField.java solr/core/src/java/org/apache/solr/schema/LongPointField.java solr/core/src/java/org/apache/solr/schema/PointField.java
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit cf3fc6dc802c244513fb7a8056f477dbf3e731c3 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cf3fc6d ]

        SOLR-10425: Merge remote-tracking branch 'refs/remotes/origin/branch_6x' into branch_6x

        Show
        jira-bot ASF subversion and git services added a comment - Commit cf3fc6dc802c244513fb7a8056f477dbf3e731c3 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cf3fc6d ] SOLR-10425 : Merge remote-tracking branch 'refs/remotes/origin/branch_6x' into branch_6x
        Hide
        hossman Hoss Man added a comment -

        NOTE: the backport to 6x was messy because of other API changes, for thatreason i'm not going to push to branch_6_5 (for 6.5.1) until someone else looks over the 6x changes

        Tomás Fernández Löbbe: if you think the changes look ok, feel free to push the 6.5 backport – no need to wait on me.

        Show
        hossman Hoss Man added a comment - NOTE: the backport to 6x was messy because of other API changes, for thatreason i'm not going to push to branch_6_5 (for 6.5.1) until someone else looks over the 6x changes Tomás Fernández Löbbe : if you think the changes look ok, feel free to push the 6.5 backport – no need to wait on me.
        Hide
        tomasflobbe Tomás Fernández Löbbe added a comment -

        Changes look good to me. The conflicts were, #1 in createFIelds (just add the boost) and #2 only in the tests, right? I cherry-picked the 6.x commit locally and I'm running tests in branch_6_5. I'll push that once tests pass.

        Show
        tomasflobbe Tomás Fernández Löbbe added a comment - Changes look good to me. The conflicts were, #1 in createFIelds (just add the boost) and #2 only in the tests, right? I cherry-picked the 6.x commit locally and I'm running tests in branch_6_5. I'll push that once tests pass.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit bfa2d837c4695b96096d03d4349e88042c862423 in lucene-solr's branch refs/heads/branch_6_5 from Chris Hostetter
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bfa2d83 ]

        SOLR-10425: Fix indexed="false" on numeric PointFields

        (cherry picked from commit 6e5f6fab53b8c6f4acbebd51c346173829a3247a)

        Resolved Conflicts:
        1) 6x still has boost support in createFields
        2) 6x doesn't have the an iterator based API for DocValues (used in tests)

        solr/core/src/java/org/apache/solr/schema/DatePointField.java
        solr/core/src/java/org/apache/solr/schema/DoublePointField.java
        solr/core/src/java/org/apache/solr/schema/FloatPointField.java
        solr/core/src/java/org/apache/solr/schema/IntPointField.java
        solr/core/src/java/org/apache/solr/schema/LongPointField.java
        solr/core/src/java/org/apache/solr/schema/PointField.java

        Show
        jira-bot ASF subversion and git services added a comment - Commit bfa2d837c4695b96096d03d4349e88042c862423 in lucene-solr's branch refs/heads/branch_6_5 from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bfa2d83 ] SOLR-10425 : Fix indexed="false" on numeric PointFields (cherry picked from commit 6e5f6fab53b8c6f4acbebd51c346173829a3247a) Resolved Conflicts: 1) 6x still has boost support in createFields 2) 6x doesn't have the an iterator based API for DocValues (used in tests) solr/core/src/java/org/apache/solr/schema/DatePointField.java solr/core/src/java/org/apache/solr/schema/DoublePointField.java solr/core/src/java/org/apache/solr/schema/FloatPointField.java solr/core/src/java/org/apache/solr/schema/IntPointField.java solr/core/src/java/org/apache/solr/schema/LongPointField.java solr/core/src/java/org/apache/solr/schema/PointField.java

          People

          • Assignee:
            tomasflobbe Tomás Fernández Löbbe
            Reporter:
            hossman Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development