Solr
  1. Solr
  2. SOLR-1570

Complain loudly if uniqueKey field is definied but not stored=true,multiValued=false

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5, 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      When loading a new schema, Solr should log some "SEVERE" warnings if the schema uses a uniqueKey field, but that field/type don't match the expected needs of unieuqKey field for most functionality to work (stored=true, multiValued=false) ... that way people won't (have any reason to) be suprised when things break later)

      1. SOLR-1570.patch
        0.8 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Show
          Hoss Man added a comment - recent example of confusion... http://old.nabble.com/java.lang.NullPointerException-at-org.apache.solr.handler.component.QueryComponent.mergeIds%28QueryComponent.java%3A421%29-to26392614.html
          Hide
          Lance Norskog added a comment -

          Please do not require indexed=true. There are use cases where it is fine to have search terms that find a document, but searching by the ID is not needed. If there are a huge number of documents and only a few search terms per document, indexing the ID is a noticeable addition to the index file size.

          (These are situations where deduplication is not necessary.)

          Show
          Lance Norskog added a comment - Please do not require indexed=true. There are use cases where it is fine to have search terms that find a document, but searching by the ID is not needed. If there are a huge number of documents and only a few search terms per document, indexing the ID is a noticeable addition to the index file size. (These are situations where deduplication is not necessary.)
          Hide
          Hoss Man added a comment -

          Lance: i wasn't suggesting that we required unieuqKey to be indexed, and shalin's patch doesn't do this ... my concern is merely that we log a an error – if you know what you're doing and want to ignore that erorr then that's totally your choice.

          I'm curious however: the only purpose for having a uniqueKey field in solr is the deduping and delete by id ... if you don't make your uniqueKey field indexed, then neither of those things are possible, so why bother using the <uniqueKey> declaration at all? ... you can still have an 'id" field without using <uniqueKey>, so maybe we should require uniqueKey to be indexed?

          Shalin: your patch looks good to me, except that it doesn't warn about the indexed=false case.

          Show
          Hoss Man added a comment - Lance: i wasn't suggesting that we required unieuqKey to be indexed, and shalin's patch doesn't do this ... my concern is merely that we log a an error – if you know what you're doing and want to ignore that erorr then that's totally your choice. I'm curious however: the only purpose for having a uniqueKey field in solr is the deduping and delete by id ... if you don't make your uniqueKey field indexed, then neither of those things are possible, so why bother using the <uniqueKey> declaration at all? ... you can still have an 'id" field without using <uniqueKey>, so maybe we should require uniqueKey to be indexed? Shalin: your patch looks good to me, except that it doesn't warn about the indexed=false case.
          Hide
          Shalin Shekhar Mangar added a comment -

          Shalin: your patch looks good to me, except that it doesn't warn about the indexed=false case.

          The getIndexedField method already does this but it throws an exception instead of logging a warning. Should we change that and allow un-indexed uniqueKeys (at least highlighting can still work)?

          Show
          Shalin Shekhar Mangar added a comment - Shalin: your patch looks good to me, except that it doesn't warn about the indexed=false case. The getIndexedField method already does this but it throws an exception instead of logging a warning. Should we change that and allow un-indexed uniqueKeys (at least highlighting can still work)?
          Hide
          Hoss Man added a comment -

          The getIndexedField method already does this but it throws an exception instead of logging a warning.

          OH ... right, missed that.

          Should we change that and allow un-indexed uniqueKeys (at least highlighting can still work)?

          I don't understand – highlighting should work fine if you don't declare a uniqueKey (it just requires you to match up the highlighting sections by position instead of by key) so if we've already been requiring that uniqueKey be indexed i'm cool with leaving it that way.

          still interested in Lance's use cases though (ie: what advantage there is in declaring a field the unqiqueKey even if you aren't going to delete/update it that can't be gained just by having aa regular stored field named "id" or soemthing)

          Show
          Hoss Man added a comment - The getIndexedField method already does this but it throws an exception instead of logging a warning. OH ... right, missed that. Should we change that and allow un-indexed uniqueKeys (at least highlighting can still work)? I don't understand – highlighting should work fine if you don't declare a uniqueKey (it just requires you to match up the highlighting sections by position instead of by key) so if we've already been requiring that uniqueKey be indexed i'm cool with leaving it that way. still interested in Lance's use cases though (ie: what advantage there is in declaring a field the unqiqueKey even if you aren't going to delete/update it that can't be gained just by having aa regular stored field named "id" or soemthing)
          Hide
          Hoss Man added a comment -

          And FWIW: even if Lance has a really compelling reason, i think we should just commit this patch, resolve this issue, and open a new issue to make indexed=false ok since it's significantly different from this one

          Show
          Hoss Man added a comment - And FWIW: even if Lance has a really compelling reason, i think we should just commit this patch, resolve this issue, and open a new issue to make indexed=false ok since it's significantly different from this one
          Hide
          Lance Norskog added a comment -

          the only purpose for having a uniqueKey field in solr is the deduping and delete by id

          There's a third reason: integration with other data stores with a common id. The use case is to search by terms, get ids back, and do joins against another database. Some of these systems have very large data volumes and want to minimise indexing time and disk size.

          Show
          Lance Norskog added a comment - the only purpose for having a uniqueKey field in solr is the deduping and delete by id There's a third reason: integration with other data stores with a common id. The use case is to search by terms, get ids back, and do joins against another database. Some of these systems have very large data volumes and want to minimise indexing time and disk size.
          Hide
          Hoss Man added a comment -

          There's a third reason: integration with other data stores with a common id.

          I dont' understand ... what you're describing makes total sense for why you want to put those ids into some "myExternalId" field in solr (as a stored field), but do you have any reason to put "<uniqueKey>myExternalId</uniqueKey>" into your schema that doesn't already require "myExternalId" to be indexed ?

          you can put ids into solr w/o telling it their are uniqueKeys and still use them for external purposes ... my point was that the only reason to tell solr about them is for deduing and delete by Id – both of which require that it be indexed.

          Show
          Hoss Man added a comment - There's a third reason: integration with other data stores with a common id. I dont' understand ... what you're describing makes total sense for why you want to put those ids into some "myExternalId" field in solr (as a stored field), but do you have any reason to put "<uniqueKey>myExternalId</uniqueKey>" into your schema that doesn't already require "myExternalId" to be indexed ? you can put ids into solr w/o telling it their are uniqueKeys and still use them for external purposes ... my point was that the only reason to tell solr about them is for deduing and delete by Id – both of which require that it be indexed.
          Hide
          Shalin Shekhar Mangar added a comment -

          And FWIW: even if Lance has a really compelling reason, i think we should just commit this patch, resolve this issue, and open a new issue to make indexed=false ok since it's significantly different from this one

          OK, I'll commit this soon.

          Show
          Shalin Shekhar Mangar added a comment - And FWIW: even if Lance has a really compelling reason, i think we should just commit this patch, resolve this issue, and open a new issue to make indexed=false ok since it's significantly different from this one OK, I'll commit this soon.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 884263.

          Thanks!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 884263. Thanks!
          Hide
          Hoss Man added a comment -

          Correcting Fix Version based on CHANGES.txt, see this thread for more details...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Show
          Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development