Solr
  1. Solr
  2. SOLR-3831

atomic updates do not distribute correctly to other nodes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.0-BETA
    • Fix Version/s: 4.0, 6.0
    • Component/s: SolrCloud
    • Labels:
      None
    • Environment:

      linux

      Description

      After setting up two independent solr nodes using the SolrCloud tutorial, atomic updates to a field of type "payloads" gives an error when updating the destination node.

      The error is:

      SEVERE: java.lang.NumberFormatException: For input string: "100}"

      The input sent to the first node is in the expected default format for a payload field (eg "foo|100") and that update succeeds. I've found that the update always works for the first node, but never the second.

      I've tested each server running independently and found that this update works as expected.

        Activity

        Hide
        Jim Musil added a comment -

        Actually, it appears that atomic updates to any type of field do not distribute correctly to the other nodes.

        The root problem seems to be that when the subsequent node receives the update request, it does not apply any of the atomic update logic that's been added to DistributedUpdateProccessor.getUpdatedDocument(). Instead it tries to use the string representation of the Map object (eg "

        {set=1}

        ") as the value.

        If I send the following json to update [{"id":"1", "cat_s":{"set":"999"}}] The leader sets it correctly without deleting the other fields. Node 2, however, sets the value of cat_s to be "

        {set=999}

        ".

        I've hacked a solution by forcing all update requests to use getUpdatedDocument(), but I'm not clear what other effects this may have.

        I'm not sure what the correct solution should be, but I'm willing to try to patch it.

        Show
        Jim Musil added a comment - Actually, it appears that atomic updates to any type of field do not distribute correctly to the other nodes. The root problem seems to be that when the subsequent node receives the update request, it does not apply any of the atomic update logic that's been added to DistributedUpdateProccessor.getUpdatedDocument(). Instead it tries to use the string representation of the Map object (eg " {set=1} ") as the value. If I send the following json to update [{"id":"1", "cat_s":{"set":"999"}}] The leader sets it correctly without deleting the other fields. Node 2, however, sets the value of cat_s to be " {set=999} ". I've hacked a solution by forcing all update requests to use getUpdatedDocument(), but I'm not clear what other effects this may have. I'm not sure what the correct solution should be, but I'm willing to try to patch it.
        Hide
        Yonik Seeley added a comment -

        Thanks for the report Jim, this looks serious. I've marked this as a blocker for 4.0.

        There are some very basic tests for cloud mode in BasicDistributedZkTest.doOptimisticLockingAndUpdating(), but it should have been enough to catch an issue like this.

        Show
        Yonik Seeley added a comment - Thanks for the report Jim, this looks serious. I've marked this as a blocker for 4.0. There are some very basic tests for cloud mode in BasicDistributedZkTest.doOptimisticLockingAndUpdating(), but it should have been enough to catch an issue like this.
        Hide
        Yonik Seeley added a comment -

        The root problem seems to be that when the subsequent node receives the update request, it does not apply any of the atomic update logic

        We shouldn't need to at that point.
        The current logic is for the leader to retrieve the correct document, apply the updates, then index as a normal document as well as forward to all replicas.

        Show
        Yonik Seeley added a comment - The root problem seems to be that when the subsequent node receives the update request, it does not apply any of the atomic update logic We shouldn't need to at that point. The current logic is for the leader to retrieve the correct document, apply the updates, then index as a normal document as well as forward to all replicas.
        Hide
        Jim Musil added a comment -

        Ok, that makes sense. I don't think that's what the code is doing, however. The logic appears to be:

        1. clone the input doc
        2. apply atomic update logic to produce full doc
        3. add the full doc locally
        4. revert back to the original input doc
        5. distribute the command to other nodes

        A problem occurs deep within #5 because there's no atomic update logic built into the distribAdd() chain for converting "add", "set", or "inc" into a proper LuceneDocument.

        By simply commenting out this line (343 on trunk) in DistributedUpdateProcessor.java, the updates go through correctly.

        cmd.solrDoc = clonedDoc;

        Show
        Jim Musil added a comment - Ok, that makes sense. I don't think that's what the code is doing, however. The logic appears to be: 1. clone the input doc 2. apply atomic update logic to produce full doc 3. add the full doc locally 4. revert back to the original input doc 5. distribute the command to other nodes A problem occurs deep within #5 because there's no atomic update logic built into the distribAdd() chain for converting "add", "set", or "inc" into a proper LuceneDocument. By simply commenting out this line (343 on trunk) in DistributedUpdateProcessor.java, the updates go through correctly. cmd.solrDoc = clonedDoc;
        Hide
        Yonik Seeley added a comment -

        Thanks for investigating Jim - the "clone the input doc" definitely doesn't belong first!

        Show
        Yonik Seeley added a comment - Thanks for investigating Jim - the "clone the input doc" definitely doesn't belong first!
        Hide
        Jim Musil added a comment -

        No problem. I'm very eager to start using them. Let me know if there's something I can do to help.

        Show
        Jim Musil added a comment - No problem. I'm very eager to start using them. Let me know if there's something I can do to help.
        Hide
        Mark Miller added a comment -

        patch that moves the clone - also only does it if we are actually going to distribute the update

        Show
        Mark Miller added a comment - patch that moves the clone - also only does it if we are actually going to distribute the update
        Hide
        Mark Miller added a comment -

        Fix committed - thanks Jim!

        Show
        Mark Miller added a comment - Fix committed - thanks Jim!
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1385138

        SOLR-3831: Atomic updates do not distribute correctly to other nodes.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1385138 SOLR-3831 : Atomic updates do not distribute correctly to other nodes.
        Hide
        Markus Jelsma added a comment -

        Hi - i've seen some issues being committed to 4x branch but not for trunk yet this and other issues are marked as resolved. Are they going to be committed to trunk? thanks

        Show
        Markus Jelsma added a comment - Hi - i've seen some issues being committed to 4x branch but not for trunk yet this and other issues are marked as resolved. Are they going to be committed to trunk? thanks
        Hide
        Mark Miller added a comment -

        This should be in trunk.

        You may be confused by the above commit bot message?

        New tool I'm working on, and yesterday I accidentally triggered it for my name in the last 400 or so commits, but only for 4x - I stopped it before it did 5x.

        So if you know something is not on 5x that is on 4x, we need to fix it. But don't go by the commit bot messages for these past issues - hoping that's something you can count on in the future, but it's in development at the moment.

        Show
        Mark Miller added a comment - This should be in trunk. You may be confused by the above commit bot message? New tool I'm working on, and yesterday I accidentally triggered it for my name in the last 400 or so commits, but only for 4x - I stopped it before it did 5x. So if you know something is not on 5x that is on 4x, we need to fix it. But don't go by the commit bot messages for these past issues - hoping that's something you can count on in the future, but it's in development at the moment.
        Hide
        Markus Jelsma added a comment -

        I was indeed confused. Thanks!

        Show
        Markus Jelsma added a comment - I was indeed confused. Thanks!
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1385138

        SOLR-3831: Atomic updates do not distribute correctly to other nodes.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1385138 SOLR-3831 : Atomic updates do not distribute correctly to other nodes.
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Mark Miller
            Reporter:
            Jim Musil
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development