Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8865

real-time get does not retrieve values from docValues

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5.1, 5.6, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Uncovered during ad-hoc testing... the version field, which has stored=false docValues=true is not retrieved with realtime-get

      1. SOLR-8865.patch
        0.8 kB
        Ishan Chattopadhyaya
      2. SOLR-8865.patch
        0.8 kB
        Ishan Chattopadhyaya
      3. SOLR-8865.patch
        5 kB
        Ishan Chattopadhyaya
      4. SOLR-8865.patch
        7 kB
        Ishan Chattopadhyaya
      5. SOLR-8865.patch
        13 kB
        Yonik Seeley
      6. SOLR-8865.patch
        15 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Interesting.. I thought we fixed this with SOLR-8276, but seems like not fixed?

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Interesting.. I thought we fixed this with SOLR-8276 , but seems like not fixed?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          I just followed the first couple of steps here: http://yonik.com/solr-tutorial/
          Best guess is that not all paths were covered... I think the steps above will lead to a retrieval from the tlog and not the index.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - I just followed the first couple of steps here: http://yonik.com/solr-tutorial/ Best guess is that not all paths were covered... I think the steps above will lead to a retrieval from the tlog and not the index.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          I think something like the attached patch (untested) might work?

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - I think something like the attached patch (untested) might work?
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          I'm working on a test for this.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - I'm working on a test for this.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Ah cool, good thing you brought it up... I was just going to start looking into that myself!

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Ah cool, good thing you brought it up... I was just going to start looking into that myself!
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Here's a patch that works. Made the TestRealTimeGet to randomly use non stored docValues based _version_ field. The full test suite passes.

          Yonik Seeley Can you please review? Is there something we can do better for avoiding the double addition of a dv field in the toSolrDoc method?

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Here's a patch that works. Made the TestRealTimeGet to randomly use non stored docValues based _version_ field. The full test suite passes. Yonik Seeley Can you please review? Is there something we can do better for avoiding the double addition of a dv field in the toSolrDoc method?
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Added a test for a multivalued field, to ensure that the logic for avoiding double adding of dv fields doesn't affect multivalued fields.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Added a test for a multivalued field, to ensure that the logic for avoiding double adding of dv fields doesn't affect multivalued fields.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          I think this should be made a blocker for 6.0.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - I think this should be made a blocker for 6.0.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Sorry for not getting to this earlier, busy weekend and now I'm sick...

          Is there something we can do better for avoiding the double addition of a dv field in the toSolrDoc method?

          Why/how does the double addition happen?

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Sorry for not getting to this earlier, busy weekend and now I'm sick... Is there something we can do better for avoiding the double addition of a dv field in the toSolrDoc method? Why/how does the double addition happen?
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Sorry for not getting to this earlier, busy weekend and now I'm sick...

          Ouch, get well soon Yonik!

          Why/how does the double addition happen?

          I remember this double addition was due to the toSolrDoc method that converts a tlog document (which has only one field) to a Lucene document and then back again to a Solr document. I'll debug this again and quote specific parts of the code and in-memory lucene and solr documents from a test run next.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Sorry for not getting to this earlier, busy weekend and now I'm sick... Ouch, get well soon Yonik! Why/how does the double addition happen? I remember this double addition was due to the toSolrDoc method that converts a tlog document (which has only one field) to a Lucene document and then back again to a Solr document. I'll debug this again and quote specific parts of the code and in-memory lucene and solr documents from a test run next.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          remember this double addition was due to the toSolrDoc method that converts a tlog document (which has only one field) to a Lucene document and then back again to a Solr document.

          Ah, I see... creating the lucene document creates both a docValue field and a stored field for the same field.
          But this would hold for multi-valued fields as well, meaning the part of this patch that checks if the value has already been added won't work for multi-valued fields?

          Seems like a better approach would be to fix the code that creates duplicates in the first place... so in this code:

              for (IndexableField f : doc.getFields()) {
                if (f.fieldType().stored()) {
                  out.add((IndexableField) f);
                } else {
                  SchemaField schemaField = schema.getFieldOrNull(f.name());
                  if (schemaField != null && schemaField.hasDocValues() && schemaField.useDocValuesAsStored()) {
                    out.add((IndexableField) f);
                  }
                }
              }
          

          Drive it off of our schema... if the field is both stored and has docValues, then we can simply avoid one of them.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - remember this double addition was due to the toSolrDoc method that converts a tlog document (which has only one field) to a Lucene document and then back again to a Solr document. Ah, I see... creating the lucene document creates both a docValue field and a stored field for the same field. But this would hold for multi-valued fields as well, meaning the part of this patch that checks if the value has already been added won't work for multi-valued fields? Seems like a better approach would be to fix the code that creates duplicates in the first place... so in this code: for (IndexableField f : doc.getFields()) { if (f.fieldType().stored()) { out.add((IndexableField) f); } else { SchemaField schemaField = schema.getFieldOrNull(f.name()); if (schemaField != null && schemaField.hasDocValues() && schemaField.useDocValuesAsStored()) { out.add((IndexableField) f); } } } Drive it off of our schema... if the field is both stored and has docValues, then we can simply avoid one of them.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Made the TestRealTimeGet to randomly use non stored docValues based version field.

          Rather than randomly using a different schema, how about changing/adding a test that just uses more fields?
          We need to test combinations of stored, docValues, multiValued, etc.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Made the TestRealTimeGet to randomly use non stored docValues based version field. Rather than randomly using a different schema, how about changing/adding a test that just uses more fields? We need to test combinations of stored, docValues, multiValued, etc.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          OK, I have some time to take a shot at this now...

          Show
          yseeley@gmail.com Yonik Seeley added a comment - OK, I have some time to take a shot at this now...
          Hide
          yseeley@gmail.com Yonik Seeley added a comment - - edited

          Checkpoint patch... I updated schema_latest to make a bunch of the docValues fields unstored and used that.

          Progress being made... currently an interesting fail in testRealTimeGet():

           expected ==={'doc':{'id':'1', a_f:-1.5, a_fd:-1.5, a_fdS:-1.5,  a_fs:[1.0,2.5],a_fds:[1.0,2.5]       }}
           response = {
            "doc":
            {
              "id":"1",
              "a_f":-1.5,
              "a_fd":-1.5,
              "a_fdS":-1.5,
              "a_fs":[1.0,
                2.5],
              "a_fds":[1.0,
                "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=null",
                2.5,
                "ERROR:SCHEMA-INDEX-MISMATCH,stringValue=null"]}}
          

          This looks like a bug in TrieField.toObject() when using docValues

          Show
          yseeley@gmail.com Yonik Seeley added a comment - - edited Checkpoint patch... I updated schema_latest to make a bunch of the docValues fields unstored and used that. Progress being made... currently an interesting fail in testRealTimeGet(): expected ==={'doc':{'id':'1', a_f:-1.5, a_fd:-1.5, a_fdS:-1.5, a_fs:[1.0,2.5],a_fds:[1.0,2.5] }} response = { "doc" : { "id" : "1" , "a_f" :-1.5, "a_fd" :-1.5, "a_fdS" :-1.5, "a_fs" :[1.0, 2.5], "a_fds" :[1.0, "ERROR:SCHEMA-INDEX-MISMATCH,stringValue= null " , 2.5, "ERROR:SCHEMA-INDEX-MISMATCH,stringValue= null " ]}} This looks like a bug in TrieField.toObject() when using docValues
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Ouch, good catch. That looks bad. Linking SOLR-8886 to this issue.

          Yonik, hope you are feeling better. Sorry, I'm currently out on vacation for Holi (https://en.wikipedia.org/wiki/Holi) and shall be able to get to this on Thursday.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Ouch, good catch. That looks bad. Linking SOLR-8886 to this issue. Yonik, hope you are feeling better. Sorry, I'm currently out on vacation for Holi ( https://en.wikipedia.org/wiki/Holi ) and shall be able to get to this on Thursday.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Yeah feeling fine now, and making progress on this. No worries.
          Enjoy your vacation, sounds fun!

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Yeah feeling fine now, and making progress on this. No worries. Enjoy your vacation, sounds fun!
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Updated patch, hopefully this is it... starting to run full tests now.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Updated patch, hopefully this is it... starting to run full tests now.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6f60ac21f303364eece9c9f2893278f7da31aef8 in lucene-solr's branch refs/heads/master from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6f60ac2 ]

          SOLR-8865: Real-time get sometimes fails to retrieve stored fields from docValues

          Show
          jira-bot ASF subversion and git services added a comment - Commit 6f60ac21f303364eece9c9f2893278f7da31aef8 in lucene-solr's branch refs/heads/master from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6f60ac2 ] SOLR-8865 : Real-time get sometimes fails to retrieve stored fields from docValues
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ca374d947678a81df665095387dadc0551590729 in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ca374d9 ]

          SOLR-8865: Real-time get sometimes fails to retrieve stored fields from docValues

          Show
          jira-bot ASF subversion and git services added a comment - Commit ca374d947678a81df665095387dadc0551590729 in lucene-solr's branch refs/heads/branch_6x from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ca374d9 ] SOLR-8865 : Real-time get sometimes fails to retrieve stored fields from docValues
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 762000f226ff015e51082adff887ee1a046dd9d6 in lucene-solr's branch refs/heads/branch_6_0 from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=762000f ]

          SOLR-8865: Real-time get sometimes fails to retrieve stored fields from docValues

          Show
          jira-bot ASF subversion and git services added a comment - Commit 762000f226ff015e51082adff887ee1a046dd9d6 in lucene-solr's branch refs/heads/branch_6_0 from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=762000f ] SOLR-8865 : Real-time get sometimes fails to retrieve stored fields from docValues
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d255771565b5c241651ffb89bb02155278cf45fd in lucene-solr's branch refs/heads/branch_5x from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d255771 ]

          SOLR-8865: Real-time get sometimes fails to retrieve stored fields from docValues

          Show
          jira-bot ASF subversion and git services added a comment - Commit d255771565b5c241651ffb89bb02155278cf45fd in lucene-solr's branch refs/heads/branch_5x from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d255771 ] SOLR-8865 : Real-time get sometimes fails to retrieve stored fields from docValues
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3619099eb58cd5691a05004d9ad004a36884c06d in lucene-solr's branch refs/heads/branch_5_5 from Yonik Seeley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3619099 ]

          SOLR-8865: Real-time get sometimes fails to retrieve stored fields from docValues

          Show
          jira-bot ASF subversion and git services added a comment - Commit 3619099eb58cd5691a05004d9ad004a36884c06d in lucene-solr's branch refs/heads/branch_5_5 from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3619099 ] SOLR-8865 : Real-time get sometimes fails to retrieve stored fields from docValues

            People

            • Assignee:
              yseeley@gmail.com Yonik Seeley
              Reporter:
              yseeley@gmail.com Yonik Seeley
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development