Solr
  1. Solr
  2. SOLR-6426

SolrZkClient clean can fail due to a race with children nodes.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10.4, 5.0, Trunk
    • Component/s: None
    • Labels:
      None

      Activity

      Hide
      Michael McCandless added a comment -

      Bulk close for 4.10.4 release

      Show
      Michael McCandless added a comment - Bulk close for 4.10.4 release
      Hide
      ASF subversion and git services added a comment -

      Commit 1662426 from shalin@apache.org in branch 'dev/branches/lucene_solr_4_10'
      [ https://svn.apache.org/r1662426 ]

      SOLR-6426: SolrZkClient clean can fail due to a race with children nodes.

      Show
      ASF subversion and git services added a comment - Commit 1662426 from shalin@apache.org in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1662426 ] SOLR-6426 : SolrZkClient clean can fail due to a race with children nodes.
      Hide
      Shalin Shekhar Mangar added a comment -

      Reopening to backport to 4.10.4

      Show
      Shalin Shekhar Mangar added a comment - Reopening to backport to 4.10.4
      Hide
      Anshum Gupta added a comment -

      Bulk close after 5.0 release.

      Show
      Anshum Gupta added a comment - Bulk close after 5.0 release.
      Hide
      Jessica Cheng Mallet added a comment -

      The only thing that I'm worried with regarding to the racing clients is that before, this code will fail (which is what you're trying to fix here), but now there might be a risk of infinite recursion here causing stack overflow if it keeps coming back to this point and finding more children after it thinks it deleted all of them. In practice it probably won't happen, but it just feels a bit scary. Maybe that part can be made iterative instead (with a maximum bail-out number of tries)?

      Show
      Jessica Cheng Mallet added a comment - The only thing that I'm worried with regarding to the racing clients is that before, this code will fail (which is what you're trying to fix here), but now there might be a risk of infinite recursion here causing stack overflow if it keeps coming back to this point and finding more children after it thinks it deleted all of them. In practice it probably won't happen, but it just feels a bit scary. Maybe that part can be made iterative instead (with a maximum bail-out number of tries)?
      Hide
      Mark Miller added a comment -

      Also, keep in mind, this change does nothing about the risk of another client constantly adding nodes. That could race with this before and after. This is about trying to remove a node that has children - it looks like the node has no children, we try to remove it, boom it has a child. It doesn't affect a race off adding and removing children nodes. It just makes the method consistent in working how it was intended rather than this odd race fail you can get.

      That doesn't mean the clean method itself could not be re implemented or something, but that's another issue.

      Show
      Mark Miller added a comment - Also, keep in mind, this change does nothing about the risk of another client constantly adding nodes. That could race with this before and after. This is about trying to remove a node that has children - it looks like the node has no children, we try to remove it, boom it has a child. It doesn't affect a race off adding and removing children nodes. It just makes the method consistent in working how it was intended rather than this odd race fail you can get. That doesn't mean the clean method itself could not be re implemented or something, but that's another issue.
      Hide
      Mark Miller added a comment -

      This should not normally happen in working Solr code, say nothing about a race with another client constantly adding nodes.

      I saw it happen in a test case or something - the reason it happens is unrelated - as a general client method, I don't think the method itself should randomly bail part way through depending on arbitrary timing.

      Show
      Mark Miller added a comment - This should not normally happen in working Solr code, say nothing about a race with another client constantly adding nodes. I saw it happen in a test case or something - the reason it happens is unrelated - as a general client method, I don't think the method itself should randomly bail part way through depending on arbitrary timing.
      Hide
      Jessica Cheng Mallet added a comment -

      Hey Mark, just took a look at this patch and there is a risk of stack overflow if children nodes are actively being added. Would you please comment on where you saw the race happen that necessitated this change? Is it better to eliminate that risk instead?

      Show
      Jessica Cheng Mallet added a comment - Hey Mark, just took a look at this patch and there is a risk of stack overflow if children nodes are actively being added. Would you please comment on where you saw the race happen that necessitated this change? Is it better to eliminate that risk instead?
      Hide
      ASF subversion and git services added a comment -

      Commit 1620246 from Mark Miller in branch 'dev/branches/branch_4x'
      [ https://svn.apache.org/r1620246 ]

      SOLR-6426: SolrZkClient clean can fail due to a race with children nodes.

      Show
      ASF subversion and git services added a comment - Commit 1620246 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1620246 ] SOLR-6426 : SolrZkClient clean can fail due to a race with children nodes.
      Hide
      ASF subversion and git services added a comment -

      Commit 1620245 from Mark Miller in branch 'dev/trunk'
      [ https://svn.apache.org/r1620245 ]

      SOLR-6426: SolrZkClient clean can fail due to a race with children nodes.

      Show
      ASF subversion and git services added a comment - Commit 1620245 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1620245 ] SOLR-6426 : SolrZkClient clean can fail due to a race with children nodes.

        People

        • Assignee:
          Mark Miller
          Reporter:
          Mark Miller
        • Votes:
          0 Vote for this issue
          Watchers:
          6 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development