Solr
  1. Solr
  2. SOLR-2796

AddUpdateCommand.getIndexedId doesn't work with schema configured defaults/copyField - UUIDField/copyField can not be used as uniqueKey field

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: update
    • Labels:
      None

      Description

      in Solr 1.4, and the HEAD of the 3x branch, the UUIDField can be used as the uniqueKey field even if documents do not specify a value by taking advantage of the default="NEW" feature of UUIDField.

      Similarly, a copyField can be used to populate the uniqueKey field with data from some field with another name – multiple copyFields can even be used if there is no overlap (ie: if you have two differnet types of documents with no overlap in their id space, you can copy from companyId->id and from productId->id and use "id" as your uniqueKey field in solr)

      Neither of these approaches work in Solr trunk because of how AddUpdateCommand.getIndexedId is currently used by the DirectUpdateHander2 (see r1152500).

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Example config...

          <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
          ...
          <field name="uuid" type="uuid" indexed="true" stored="true" required="true" default="NEW" /> 
          ...
          <uniqueKey>uuid</uniqueKey>
          

          Resulting error when posting example docs...

          SEVERE: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: uuid
          	at org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:80)
          	at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:151)
          	at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
          	at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
          	at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:133)
          	at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:78)
          	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
          	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
          	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1451)
          
          
          Show
          Hoss Man added a comment - Example config... <fieldType name= "uuid" class= "solr.UUIDField" indexed= " true " /> ... <field name= "uuid" type= "uuid" indexed= " true " stored= " true " required= " true " default = "NEW" /> ... <uniqueKey>uuid</uniqueKey> Resulting error when posting example docs... SEVERE: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: uuid at org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:80) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:151) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:133) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:78) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1451)
          Hide
          Hoss Man added a comment -

          test only patch demonstrating problem – passes on 3x, fails on trunk

          Show
          Hoss Man added a comment - test only patch demonstrating problem – passes on 3x, fails on trunk
          Hide
          Mark Miller added a comment -

          We should fix this. Looks like it was introduced with extra error checking in in SOLR-2685.

          Show
          Mark Miller added a comment - We should fix this. Looks like it was introduced with extra error checking in in SOLR-2685 .
          Hide
          Hoss Man added a comment -

          as noted in SOLR-3349 the problem also exists if you use copyField to populate the uniqueKey.

          Show
          Hoss Man added a comment - as noted in SOLR-3349 the problem also exists if you use copyField to populate the uniqueKey.
          Hide
          Hoss Man added a comment -

          Updating descriptiong after looking into it a bit more.

          Even if we reverted some of the logic in AddUpdateCommand.getIndexedId to work the way DirectUpdateHandler.getIndexedId(Document) did in the 3x branch, this defered/delayed creating of the uniqueKey field just fundamentally can't work in SolrCloud because we have to be able to determine the value for the uniqueKey field well before any schema defaults/copyFields so that the distrib processor knows which shard to forward to.

          I think we should bite the bullet and say "Starting with Solr 4, schema defaults and copyFields can not be used to populate the uniqueKey field" (we can even enforce this when parsing the schema - error if the uniqueKey field has a declared default or is the dest of a copyField) and provide UpdateProcessor alternatives for the behaviors that were previously possible with schema options...

          • FielCopyUpdateProcessor - SOLR-2599
          • UUIDFieldUpdateProcessor - generates a new UUID for a configured field name if it doesn't already have a value in it
          • TimestampUpdateProcessor - generates a new Date for a configured field name if it doesn't already have a value in it (unlikely anyone is useing a DateField as their uniqueKey, but it's possible and fairly easy to offer this just in case)

          thoughts?

          Show
          Hoss Man added a comment - Updating descriptiong after looking into it a bit more. Even if we reverted some of the logic in AddUpdateCommand.getIndexedId to work the way DirectUpdateHandler.getIndexedId(Document) did in the 3x branch, this defered/delayed creating of the uniqueKey field just fundamentally can't work in SolrCloud because we have to be able to determine the value for the uniqueKey field well before any schema defaults/copyFields so that the distrib processor knows which shard to forward to. I think we should bite the bullet and say "Starting with Solr 4, schema defaults and copyFields can not be used to populate the uniqueKey field" (we can even enforce this when parsing the schema - error if the uniqueKey field has a declared default or is the dest of a copyField) and provide UpdateProcessor alternatives for the behaviors that were previously possible with schema options... FielCopyUpdateProcessor - SOLR-2599 UUIDFieldUpdateProcessor - generates a new UUID for a configured field name if it doesn't already have a value in it TimestampUpdateProcessor - generates a new Date for a configured field name if it doesn't already have a value in it (unlikely anyone is useing a DateField as their uniqueKey, but it's possible and fairly easy to offer this just in case) thoughts?
          Hide
          Hoss Man added a comment -

          linking dependent issues.

          in addition to these, we'll also need new error checking for the copyField/defaultValue cases that are not going to be supported and a test that thye work properly

          Show
          Hoss Man added a comment - linking dependent issues. in addition to these, we'll also need new error checking for the copyField/defaultValue cases that are not going to be supported and a test that thye work properly
          Hide
          Hoss Man added a comment -

          Committed revision 1345376. - trunk
          Committed revision 1345378. - 4x

          Committed checking for these situations in IndexSchema along with explicit error messages. Commit also includes a CHANGES.txt upgrading not about using UUIDUpdateProcessorFactory to have uniqueKey values generated automatically, note will need to be updated once copy-field-esque update processor is available (tracked in SOLR-2599)

          Show
          Hoss Man added a comment - Committed revision 1345376. - trunk Committed revision 1345378. - 4x Committed checking for these situations in IndexSchema along with explicit error messages. Commit also includes a CHANGES.txt upgrading not about using UUIDUpdateProcessorFactory to have uniqueKey values generated automatically, note will need to be updated once copy-field-esque update processor is available (tracked in SOLR-2599 )

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              3 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development