Solr
  1. Solr
  2. SOLR-4505

Deadlock around SolrCoreState update lock.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      Erick found a deadlock with his core stress tool - see http://markmail.org/message/aq5hghbqia2uimgl

      1. newstack.txt
        2.03 MB
        Erick Erickson
      2. newstack.txt
        2.74 MB
        Erick Erickson
      3. newstack.txt
        424 kB
        Erick Erickson
      4. SOLR-4505.patch
        3 kB
        Mark Miller
      5. SOLR-4505.patch
        8 kB
        Mark Miller
      6. SOLR-4505.patch
        8 kB
        Mark Miller
      7. SOLR-4505.patch
        7 kB
        Mark Miller

        Activity

        Hide
        Mark Miller added a comment - - edited

        The attached patch deals with the update lock the same as the commit lock (which we changed when a previous deadlock case was found).

        While waiting for the Writer to be unused, we yield the lock, and then reacquire it when the wait is done.

        Show
        Mark Miller added a comment - - edited The attached patch deals with the update lock the same as the commit lock (which we changed when a previous deadlock case was found). While waiting for the Writer to be unused, we yield the lock, and then reacquire it when the wait is done.
        Hide
        Erick Erickson added a comment -

        I'm testing this now, it's been running for 5 minutes so all is well <G>. Actually, it's been taking a couple of hours to hit the race condition. I'll run it today in the background, and overnight tonight. If all goes well, I'll check it in tomorrow morning unless someone thinks it's a bad idea. Thanks a million!

        I'll assign this JIRA to myself just for bookeeping since I've got the stress test tool set up. Take it back if you want of course.

        Show
        Erick Erickson added a comment - I'm testing this now, it's been running for 5 minutes so all is well <G>. Actually, it's been taking a couple of hours to hit the race condition. I'll run it today in the background, and overnight tonight. If all goes well, I'll check it in tomorrow morning unless someone thinks it's a bad idea. Thanks a million! I'll assign this JIRA to myself just for bookeeping since I've got the stress test tool set up. Take it back if you want of course.
        Hide
        Erick Erickson added a comment -

        No joy, here's the latest stack trace. Haven't looked at it yet, won't have time until tonight....

        Show
        Erick Erickson added a comment - No joy, here's the latest stack trace. Haven't looked at it yet, won't have time until tonight....
        Hide
        Mark Miller added a comment -

        Same type of issue, different method. Please try this patch.

        Show
        Mark Miller added a comment - Same type of issue, different method. Please try this patch.
        Hide
        Erick Erickson added a comment -

        Unfortunately, same thing. It's so similar that I had to double-check to see if I'd actually applied the latest patch....

        Show
        Erick Erickson added a comment - Unfortunately, same thing. It's so similar that I had to double-check to see if I'd actually applied the latest patch....
        Hide
        Mark Miller added a comment -

        The lines from your stack traces dont line up with the code I have in DefaultSolrCoreState, so it seems something must be different...

        Show
        Mark Miller added a comment - The lines from your stack traces dont line up with the code I have in DefaultSolrCoreState, so it seems something must be different...
        Hide
        Mark Miller added a comment -

        Hmm...or I'm just reading it wrong - it must be waiting on the sync of the getIndexWriter itself...let me think about it for a bit.

        Show
        Mark Miller added a comment - Hmm...or I'm just reading it wrong - it must be waiting on the sync of the getIndexWriter itself...let me think about it for a bit.
        Hide
        Mark Miller added a comment -

        We need an alternate approach it seems. Can you try this patch?

        Show
        Mark Miller added a comment - We need an alternate approach it seems. Can you try this patch?
        Hide
        Erick Erickson added a comment -

        As long as you're willing to do the heavy lifting, I'm willing to test <G>....

        Trying now.

        Show
        Erick Erickson added a comment - As long as you're willing to do the heavy lifting, I'm willing to test <G>.... Trying now.
        Hide
        Erick Erickson added a comment -

        Latest lockup stack....

        Show
        Erick Erickson added a comment - Latest lockup stack....
        Hide
        Mark Miller added a comment -

        Hmm...this time the lines are diff slightly off from my src - do you have any custom changes in those couple classes?

        Show
        Mark Miller added a comment - Hmm...this time the lines are diff slightly off from my src - do you have any custom changes in those couple classes?
        Hide
        Mark Miller added a comment -

        Do you have the latest of your test up, or can you put it up? Then I can try some various things more quickly perhaps.

        Show
        Mark Miller added a comment - Do you have the latest of your test up, or can you put it up? Then I can try some various things more quickly perhaps.
        Hide
        Erick Erickson added a comment -

        I'm more than happy to run the tests, no problem there. I did attach the latest patch that I'm running with to SOLR-4196 just now, along with the stress test code. Note that it took a bit over 2 hours to hit the condition running flat out on my machine....

        Show
        Erick Erickson added a comment - I'm more than happy to run the tests, no problem there. I did attach the latest patch that I'm running with to SOLR-4196 just now, along with the stress test code. Note that it took a bit over 2 hours to hit the condition running flat out on my machine....
        Hide
        Mark Miller added a comment -

        Yeah, I'm happy to have your help Once I get something that looks better I'll ask you to blast it too - but at this rate, I'd like to be able to try some things out - I've also noticed sometimes I different hardware can trigger these things faster. My suped up linux box could cause the previous deadlock you found pretty fast.

        Show
        Mark Miller added a comment - Yeah, I'm happy to have your help Once I get something that looks better I'll ask you to blast it too - but at this rate, I'd like to be able to try some things out - I've also noticed sometimes I different hardware can trigger these things faster. My suped up linux box could cause the previous deadlock you found pretty fast.
        Hide
        Erick Erickson added a comment -

        P.S. I'll see what I can do to get the stress-test program in as a unit test by Monday FWIW.

        Show
        Erick Erickson added a comment - P.S. I'll see what I can do to get the stress-test program in as a unit test by Monday FWIW.
        Hide
        Mark Miller added a comment -

        Sorry - I put up the wrong patch! It was the same as the last one. Right one coming.

        Show
        Mark Miller added a comment - Sorry - I put up the wrong patch! It was the same as the last one. Right one coming.
        Hide
        Mark Miller added a comment -

        Here it is - dropbox must have sync'd it slow on me - half the time i make the patch one machine and upload it from another.

        Show
        Mark Miller added a comment - Here it is - dropbox must have sync'd it slow on me - half the time i make the patch one machine and upload it from another.
        Hide
        Erick Erickson added a comment -

        Life is getting better. I ran for 3 hours with the latest patch and no problems. That's longer than I've ever been able to run my stress test program before. I'll give it a whirl running overnight, but I don't anticipate anything changing. If the stress test succeeds, I'll check this in. Trunk and 4x? Or Trunk and bake for a little the 4x?

        Thanks a million for your efforts here....

        Show
        Erick Erickson added a comment - Life is getting better. I ran for 3 hours with the latest patch and no problems. That's longer than I've ever been able to run my stress test program before. I'll give it a whirl running overnight, but I don't anticipate anything changing. If the stress test succeeds, I'll check this in. Trunk and 4x? Or Trunk and bake for a little the 4x? Thanks a million for your efforts here....
        Hide
        Mark Miller added a comment -

        Trunk and 4x?

        I'd go with both.

        Show
        Mark Miller added a comment - Trunk and 4x? I'd go with both.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1451653

        SOLR-4505: Possible deadlock around SolrCoreState update lock.

        Show
        Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1451653 SOLR-4505 : Possible deadlock around SolrCoreState update lock.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1451654

        SOLR-4505: Possible deadlock around SolrCoreState update lock.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1451654 SOLR-4505 : Possible deadlock around SolrCoreState update lock.
        Hide
        Mark Miller added a comment -

        I've committed this - I'm juggling too many local issues and I want these things to bake as long as they possibly can before the 4.2 release.

        Show
        Mark Miller added a comment - I've committed this - I'm juggling too many local issues and I want these things to bake as long as they possibly can before the 4.2 release.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1451657

        SOLR-4505: CHANGES entry

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1451657 SOLR-4505 : CHANGES entry
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Mark Robert Miller
        http://svn.apache.org/viewvc?view=revision&revision=1451656

        SOLR-4505: CHANGES entry

        Show
        Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1451656 SOLR-4505 : CHANGES entry
        Hide
        Erick Erickson added a comment -

        You beat me to to it. My stress test ran all night last night without any problems, so we're good here as far as I can see.

        Show
        Erick Erickson added a comment - You beat me to to it. My stress test ran all night last night without any problems, so we're good here as far as I can see.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Erick Erickson
        http://svn.apache.org/viewvc?view=revision&revision=1452321

        SOLR-4505, another place where I missed closing the stream

        Show
        Commit Tag Bot added a comment - [trunk commit] Erick Erickson http://svn.apache.org/viewvc?view=revision&revision=1452321 SOLR-4505 , another place where I missed closing the stream
        Hide
        Erick Erickson added a comment -
        Show
        Erick Erickson added a comment - Tagged http://svn.apache.org/viewvc?view=revision&revision=1452321 to the wrong JIRA, should have been SOLR-4525
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Mark Miller
            Reporter:
            Mark Miller
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development