Solr
  1. Solr
  2. SOLR-1695

Missleading error message when adding docs with missing/multiple value(s) for uniqueKey field

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5, 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Sometimes users don't seem to notice/understand the <uniqueKey/> declaration in the example schema, and the error message they get if their documents don't include that field is confusing...

      org.apache.solr.common.SolrException: Document [null] missing required field: id
      

      ...because they get an almost identical error even if they remove required=true from <field name="id" /> in their schema.xml file.

      We should improve the error message so it's clear when a Document is missing the "uniqueKeyField" (not just a "required" field) so they know the terminology to look for in diagnosing the problem.

      http://old.nabble.com/solr-1.4-csv-import-----Document-missing-required-field%3A-id-to26990048.html#a26990779

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Committed revision 911228.
          Committed revision 911232.

          I added an explicit checks for the number of uniqueKey values being != 1 early on in DocumentBuilder.toDocument. Prior to this, multiple values weren't checked for until the doc made it all the way to the UpdateHandler.

          Show
          Hoss Man added a comment - Committed revision 911228. Committed revision 911232. I added an explicit checks for the number of uniqueKey values being != 1 early on in DocumentBuilder.toDocument. Prior to this, multiple values weren't checked for until the doc made it all the way to the UpdateHandler.
          Hide
          Hoss Man added a comment -

          revising summary

          Show
          Hoss Man added a comment - revising summary
          Hide
          Yonik Seeley added a comment -

          reopening - this broke the build.

          Show
          Yonik Seeley added a comment - reopening - this broke the build.
          Hide
          Hoss Man added a comment -

          Doh!

          Note to self: don't just run the tests, remember to look at the results as well.

          The DocumentBuilderTest failures make sense: they use a schema with uniqueKey defined, but add docs w/o that field to test other behaviors of toDocument. They passed prior to this change because the only tested to toDocument method in isolation, andthe test for a missing uniqueKey was missing from that method. I think it's safe to consider these tests broken as written, since toDocument does do schema validation – it just wasn't doing the uniqueKey validation before. So i'll modify those tests to include a value for the uniqueKey field

          the ConvertedLegacyTest failure confuses me though ... it also adds docs w/o a uniqueKey field even though the schema requires one, but they do full adds so it's not obvious from the surface why it was ever passing before ... i want to think about that a little more before just "fixing' the test – it may be masking another bug.

          Show
          Hoss Man added a comment - Doh! Note to self: don't just run the tests, remember to look at the results as well. The DocumentBuilderTest failures make sense: they use a schema with uniqueKey defined, but add docs w/o that field to test other behaviors of toDocument. They passed prior to this change because the only tested to toDocument method in isolation, andthe test for a missing uniqueKey was missing from that method. I think it's safe to consider these tests broken as written, since toDocument does do schema validation – it just wasn't doing the uniqueKey validation before. So i'll modify those tests to include a value for the uniqueKey field the ConvertedLegacyTest failure confuses me though ... it also adds docs w/o a uniqueKey field even though the schema requires one, but they do full adds so it's not obvious from the surface why it was ever passing before ... i want to think about that a little more before just "fixing' the test – it may be masking another bug.
          Hide
          Yonik Seeley added a comment -

          the ConvertedLegacyTest failure confuses me though

          schema.xml does not require the "id" field, and the failing add explicitly says "allowDups=false" (legacy speak for overwrite=false)

          Show
          Yonik Seeley added a comment - the ConvertedLegacyTest failure confuses me though schema.xml does not require the "id" field, and the failing add explicitly says "allowDups=false" (legacy speak for overwrite=false)
          Hide
          Hoss Man added a comment -

          Hmmm.... ok so the reason the legacy test passed prior to this change is that DirectUpdateHandler2 (and DirectUpdateHandler from what i can tell) don't bother checking for a uniqueKey (or for multiple uniqueKeys) if allowDups="true" (which it is in the line of ConvertedLEgacyTest that's failing).

          So the question becomes: Is it a bug that DUH(2) allow docs w/o a uniqueKey field just because allowDups=true?

          If it's not a bug, then this entire patch should probably be rolled back – but personally It feels like it really is a bug: if a schema declares a uniqueKey field, then just because a particular add command says allowDups=true doesn't mean that docs w/o an id (or with multiple ids) should be allowed in to the index – those docs will need meaningful ids if/when a later commit does want to override them (consider the case of doing an initial build w/ allowDups=true for speed, and then incremental updates w/ allowDups=false ... the index needs to be internally consistent.

          Actually: I'm just going to roll this entire patch back either way – we can improve the error messages generated by DirectUpdateHandler2 and eliminate the redundant uniqueKey check in DocumentBuilder.toDocument. As a separate issue we can consider whether DUH2 is buggy.

          Show
          Hoss Man added a comment - Hmmm.... ok so the reason the legacy test passed prior to this change is that DirectUpdateHandler2 (and DirectUpdateHandler from what i can tell) don't bother checking for a uniqueKey (or for multiple uniqueKeys) if allowDups="true" (which it is in the line of ConvertedLEgacyTest that's failing). So the question becomes: Is it a bug that DUH(2) allow docs w/o a uniqueKey field just because allowDups=true? If it's not a bug, then this entire patch should probably be rolled back – but personally It feels like it really is a bug: if a schema declares a uniqueKey field, then just because a particular add command says allowDups=true doesn't mean that docs w/o an id (or with multiple ids) should be allowed in to the index – those docs will need meaningful ids if/when a later commit does want to override them (consider the case of doing an initial build w/ allowDups=true for speed, and then incremental updates w/ allowDups=false ... the index needs to be internally consistent. Actually: I'm just going to roll this entire patch back either way – we can improve the error messages generated by DirectUpdateHandler2 and eliminate the redundant uniqueKey check in DocumentBuilder.toDocument. As a separate issue we can consider whether DUH2 is buggy.
          Hide
          Hoss Man added a comment -

          schema.xml does not require the "id" field, and the failing add explicitly says "allowDups=false" (legacy speak for overwrite=false)

          ...it doesn't require "id" but it does declare "id" as the uniqueKey field ... even if it's allowing dups shouldn't it ensure that the docs has 1 and only one value for the uniqueKey field?

          Show
          Hoss Man added a comment - schema.xml does not require the "id" field, and the failing add explicitly says "allowDups=false" (legacy speak for overwrite=false) ...it doesn't require "id" but it does declare "id" as the uniqueKey field ... even if it's allowing dups shouldn't it ensure that the docs has 1 and only one value for the uniqueKey field?
          Hide
          Yonik Seeley added a comment -

          ...it doesn't require "id" but it does declare "id" as the uniqueKey field ... even if it's allowing dups shouldn't it ensure that the docs has 1 and only one value for the uniqueKey field?

          That depends... it makes sense for "normal" documents, but I've seen people that add a few auxillary documents to their index that used different fields, including the id field. But it's not like you gain greater power by allowing that - those usecases could be covered by forcing the user to come up with a uniqueKey value too... it would just be a little more work.

          Show
          Yonik Seeley added a comment - ...it doesn't require "id" but it does declare "id" as the uniqueKey field ... even if it's allowing dups shouldn't it ensure that the docs has 1 and only one value for the uniqueKey field? That depends... it makes sense for "normal" documents, but I've seen people that add a few auxillary documents to their index that used different fields, including the id field. But it's not like you gain greater power by allowing that - those usecases could be covered by forcing the user to come up with a uniqueKey value too... it would just be a little more work.
          Hide
          Hoss Man added a comment -

          Committed revision 911595.

          rolledback the changes to DocumentBuilder and improved the existing error messages in UpdateHandler instead.

          Show
          Hoss Man added a comment - Committed revision 911595. rolledback the changes to DocumentBuilder and improved the existing error messages in UpdateHandler instead.
          Hide
          Hoss Man added a comment -

          Correcting Fix Version based on CHANGES.txt, see this thread for more details...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Show
          Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development