Solr
  1. Solr
  2. SOLR-6261

Run ZK watch event callbacks in parallel to the event thread

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.9
    • Fix Version/s: 4.10, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover.

      For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel.

        Issue Links

          Activity

          Hide
          ASF GitHub Bot added a comment -

          GitHub user andyetitmoves opened a pull request:

          https://github.com/apache/lucene-solr/pull/66

          Run checkIfIamLeader in a separate thread

          Initial patch for SOLR-6261(https://issues.apache.org/jira/browse/SOLR-6261) to run `checkIfIAmLeader` in a separate thread, passes all tests.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/bloomberg/lucene-solr trunk-parallel-leader

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/66.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #66


          commit 6b0c98c6462a05c24dbf111450c14e53a447b6d3
          Author: Ramkumar Aiyengar <andyetitmoves@gmail.com>
          Date: 2014-07-20T19:08:58Z

          Run checkIfIamLeader in a separate thread


          Show
          ASF GitHub Bot added a comment - GitHub user andyetitmoves opened a pull request: https://github.com/apache/lucene-solr/pull/66 Run checkIfIamLeader in a separate thread Initial patch for SOLR-6261 ( https://issues.apache.org/jira/browse/SOLR-6261 ) to run `checkIfIAmLeader` in a separate thread, passes all tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-parallel-leader Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/66.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #66 commit 6b0c98c6462a05c24dbf111450c14e53a447b6d3 Author: Ramkumar Aiyengar <andyetitmoves@gmail.com> Date: 2014-07-20T19:08:58Z Run checkIfIamLeader in a separate thread
          Hide
          Mark Miller added a comment -

          Hmm...I'm a little hesitant to fire up a new thread for every one rather than use the Update executor or something. Seems like a good step forward though.

          Show
          Mark Miller added a comment - Hmm...I'm a little hesitant to fire up a new thread for every one rather than use the Update executor or something. Seems like a good step forward though.
          Hide
          Ramkumar Aiyengar added a comment - - edited

          Yeah, I thought of pooling this up as well initially, but then this is really a function of number of cores in the instance and a lot of threads used by Solr are really a function of the number of cores anyway?

          Can still look into changing it..

          Show
          Ramkumar Aiyengar added a comment - - edited Yeah, I thought of pooling this up as well initially, but then this is really a function of number of cores in the instance and a lot of threads used by Solr are really a function of the number of cores anyway? Can still look into changing it..
          Hide
          Mark Miller added a comment -

          I dunno - I struggled with it when I first saw it and quickly got lazy about it. Something nicer about it, I think it's best to use pools to spin up threads, but I have a hard time worrying about it too much in this case.

          Show
          Mark Miller added a comment - I dunno - I struggled with it when I first saw it and quickly got lazy about it. Something nicer about it, I think it's best to use pools to spin up threads, but I have a hard time worrying about it too much in this case.
          Hide
          Mark Miller added a comment -

          We should look across our process methods and make sure there are not other obvious spots we are holding things up.

          Show
          Mark Miller added a comment - We should look across our process methods and make sure there are not other obvious spots we are holding things up.
          Hide
          Ramkumar Aiyengar added a comment -

          Alternative approach using an executor, just a sketch at this point (still fails a few tests). It has an `instanceof` which is a bit ugly, but any other method to maintain existing behaviour when needed can be used, this was just the simplest.. Once we are settled on the approach, we can hunt down other stuff using the event thread..

          https://github.com/apache/lucene-solr/pull/66/files

          (would be nice if commits to a pull showed up here..)

          Show
          Ramkumar Aiyengar added a comment - Alternative approach using an executor, just a sketch at this point (still fails a few tests). It has an `instanceof` which is a bit ugly, but any other method to maintain existing behaviour when needed can be used, this was just the simplest.. Once we are settled on the approach, we can hunt down other stuff using the event thread.. https://github.com/apache/lucene-solr/pull/66/files (would be nice if commits to a pull showed up here..)
          Hide
          Mark Miller added a comment -

          Hmm...that is a very interesting approach. I'll have to spend some time thinking about this one.

          Show
          Mark Miller added a comment - Hmm...that is a very interesting approach. I'll have to spend some time thinking about this one.
          Hide
          Mark Miller added a comment -

          I really kind of like this idea of just ensuring the zk process thread is humming along. The more I think about it, the more I like it.

          Show
          Mark Miller added a comment - I really kind of like this idea of just ensuring the zk process thread is humming along. The more I think about it, the more I like it.
          Hide
          Ramkumar Aiyengar added a comment -

          Added tests for the leader failover case (original symptoms), and the parallel watching functionality. Let me know if this approach works, if so, we have three transition approaches:

          • Always have `SolrZkClient` use the new way (probably not a great idea, esp. considering this is in SolrJ)
          • Have an option per `SolrZkClient`, this will force all or most uses within Solr to use the new approach, but allow external uses to continue as they are
          • The way it currently is, decided on a per-watch basis

          I am sort of wavering between the second and third options, opinions welcome..

          Show
          Ramkumar Aiyengar added a comment - Added tests for the leader failover case (original symptoms), and the parallel watching functionality. Let me know if this approach works, if so, we have three transition approaches: Always have `SolrZkClient` use the new way (probably not a great idea, esp. considering this is in SolrJ) Have an option per `SolrZkClient`, this will force all or most uses within Solr to use the new approach, but allow external uses to continue as they are The way it currently is, decided on a per-watch basis I am sort of wavering between the second and third options, opinions welcome..
          Hide
          Mark Miller added a comment -

          I actually kind of like option 1. What is your concern around it being in Solrj? I think, at this point, it's pretty unlikely anyone is counting on the current behavior - it's generally probably a bug. We have also already treated a lot of this at the cloud level as subject to change a bit because a lot of it is so early. Depending on the impact, we need some flexibility to get things right.

          I guess I just don't see a lot of downside or negative impact if we choose 1.

          The upside of doing 1 IMO, is that it becomes a lot harder for other/future devs to screw up. The default makes it hard to do.

          2 is not too bad, but prone to future developers consistently choosing the right flag to pass to ensure our zk thread gets to always crank along.

          3 is the least preferable to me.

          Show
          Mark Miller added a comment - I actually kind of like option 1. What is your concern around it being in Solrj? I think, at this point, it's pretty unlikely anyone is counting on the current behavior - it's generally probably a bug. We have also already treated a lot of this at the cloud level as subject to change a bit because a lot of it is so early. Depending on the impact, we need some flexibility to get things right. I guess I just don't see a lot of downside or negative impact if we choose 1. The upside of doing 1 IMO, is that it becomes a lot harder for other/future devs to screw up. The default makes it hard to do. 2 is not too bad, but prone to future developers consistently choosing the right flag to pass to ensure our zk thread gets to always crank along. 3 is the least preferable to me.
          Hide
          Ramkumar Aiyengar added a comment - - edited

          I agree (1) is ideal, and I was just being paranoid since I am not that well-versed in how this class is used outside Solr. I am happy to stick with your judgement in this case..

          Show
          Ramkumar Aiyengar added a comment - - edited I agree (1) is ideal, and I was just being paranoid since I am not that well-versed in how this class is used outside Solr. I am happy to stick with your judgement in this case..
          Hide
          Mark Miller added a comment -

          I think it's worth considering for sure, but weighing both sides, I think enforcing it for all is probably just a really overall beneficial change in this case. Getting out of the way of the notification thread without going out of your way is great.

          Show
          Mark Miller added a comment - I think it's worth considering for sure, but weighing both sides, I think enforcing it for all is probably just a really overall beneficial change in this case. Getting out of the way of the notification thread without going out of your way is great.
          Hide
          Ramkumar Aiyengar added a comment -

          Updated for Option (1), tests are still running though..

          Show
          Ramkumar Aiyengar added a comment - Updated for Option (1), tests are still running though..
          Hide
          Ramkumar Aiyengar added a comment -

          Forgot to mention, tests pass. Let me know if the changes look good..

          Show
          Ramkumar Aiyengar added a comment - Forgot to mention, tests pass. Let me know if the changes look good..
          Hide
          Mark Miller added a comment -

          I played around with this a bit late yesterday and tests where passing for me as well. I'll review a bit closer and barring anything coming up, I'll commit this soon. Great change I think.

          Show
          Mark Miller added a comment - I played around with this a bit late yesterday and tests where passing for me as well. I'll review a bit closer and barring anything coming up, I'll commit this soon. Great change I think.
          Hide
          Mark Miller added a comment -

          Can you create a final pull request with your latest Ramkumar Aiyengar

          Show
          Mark Miller added a comment - Can you create a final pull request with your latest Ramkumar Aiyengar
          Hide
          Ramkumar Aiyengar added a comment -

          What I have there should be the latest, do you see anything missing?

          Show
          Ramkumar Aiyengar added a comment - What I have there should be the latest, do you see anything missing?
          Hide
          Mark Miller added a comment -

          Ah, the original pull request info in the first JIRA comment has links that are all kept up to date as you make changes.

          Did not realize. I'll commit this change shortly.

          Show
          Mark Miller added a comment - Ah, the original pull request info in the first JIRA comment has links that are all kept up to date as you make changes. Did not realize. I'll commit this change shortly.
          Hide
          Mark Miller added a comment -

          Do you think the default keepAliveTime of 60 seconds is the right choice?

          Show
          Mark Miller added a comment - Do you think the default keepAliveTime of 60 seconds is the right choice?
          Hide
          ASF subversion and git services added a comment -

          Commit 1614244 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1614244 ]

          SOLR-6261: Run ZooKeeper watch event callbacks in parallel to the ZooKeeper event thread.

          Show
          ASF subversion and git services added a comment - Commit 1614244 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1614244 ] SOLR-6261 : Run ZooKeeper watch event callbacks in parallel to the ZooKeeper event thread.
          Hide
          Mark Miller added a comment -

          I'll backport to 4x tomorrow.

          Show
          Mark Miller added a comment - I'll backport to 4x tomorrow.
          Hide
          Ramkumar Aiyengar added a comment -

          Ah, the original pull request info in the first JIRA comment has links that are all kept up to date as you make changes. Did not realize. I'll commit this change shortly.

          Yeah, it would be nice if commits to a pull request triggered a JIRA update. Worth an INFRA ticket? I could raise one, but wasn't sure if generally only committers raised them..

          Do you think the default keepAliveTime of 60 seconds is the right choice?

          It seemed sensible and I didn't have much of an opinion on a more specific value. We certainly do want to reap these threads because the changes don't happen all the time, and they would occur within 60s of one another in predictable cases I could think of (ephemeral node expiry, overseer queue changes due to instances changing state). Other cases like LIR and admin tasks done through Overseer are more ad hoc anyway and I doubt we can associate any meaningful parameters modelling their use..

          Show
          Ramkumar Aiyengar added a comment - Ah, the original pull request info in the first JIRA comment has links that are all kept up to date as you make changes. Did not realize. I'll commit this change shortly. Yeah, it would be nice if commits to a pull request triggered a JIRA update. Worth an INFRA ticket? I could raise one, but wasn't sure if generally only committers raised them.. Do you think the default keepAliveTime of 60 seconds is the right choice? It seemed sensible and I didn't have much of an opinion on a more specific value. We certainly do want to reap these threads because the changes don't happen all the time, and they would occur within 60s of one another in predictable cases I could think of (ephemeral node expiry, overseer queue changes due to instances changing state). Other cases like LIR and admin tasks done through Overseer are more ad hoc anyway and I doubt we can associate any meaningful parameters modelling their use..
          Hide
          ASF subversion and git services added a comment -

          Commit 1614352 from Mark Miller in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1614352 ]

          SOLR-6261: Run ZooKeeper watch event callbacks in parallel to the ZooKeeper event thread.

          Show
          ASF subversion and git services added a comment - Commit 1614352 from Mark Miller in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1614352 ] SOLR-6261 : Run ZooKeeper watch event callbacks in parallel to the ZooKeeper event thread.
          Hide
          Mark Miller added a comment -

          Thanks Ramkumar!

          Show
          Mark Miller added a comment - Thanks Ramkumar!
          Hide
          Ramkumar Aiyengar added a comment -

          Mark, I realized I missed a case, the session watcher doesn't use this threadpool. In theory it should be fine as of now, as we already made it async with SOLR-5615, but might be worth keeping it in sync and avoiding any future code in ConnectionManager falling into the same trap. I have a patch and running through tests which I can post when done, should I create a new issue or put the patch here?

          Show
          Ramkumar Aiyengar added a comment - Mark, I realized I missed a case, the session watcher doesn't use this threadpool. In theory it should be fine as of now, as we already made it async with SOLR-5615 , but might be worth keeping it in sync and avoiding any future code in ConnectionManager falling into the same trap. I have a patch and running through tests which I can post when done, should I create a new issue or put the patch here?
          Hide
          ASF GitHub Bot added a comment -

          GitHub user andyetitmoves opened a pull request:

          https://github.com/apache/lucene-solr/pull/78

          Wrap connectionManager in SolrZkClient constructor

          Patch for SOLR-6261 to run the session watcher async as well.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/bloomberg/lucene-solr trunk-parallel-session-watcher

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/78.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #78


          commit 392669e5bfcd653be4f51a502d22380f599eac8c
          Author: Ramkumar Aiyengar <andyetitmoves@gmail.com>
          Date: 2014-08-04T21:08:40Z

          Wrap connectionManager in SolrZkClient constructor


          Show
          ASF GitHub Bot added a comment - GitHub user andyetitmoves opened a pull request: https://github.com/apache/lucene-solr/pull/78 Wrap connectionManager in SolrZkClient constructor Patch for SOLR-6261 to run the session watcher async as well. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-parallel-session-watcher Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/78.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #78 commit 392669e5bfcd653be4f51a502d22380f599eac8c Author: Ramkumar Aiyengar <andyetitmoves@gmail.com> Date: 2014-08-04T21:08:40Z Wrap connectionManager in SolrZkClient constructor
          Hide
          Shalin Shekhar Mangar added a comment -

          Guys, I think there's something weird happening since this was committed. Many tests such as MultiThreadedOCPTest and ShardSplitTest have been failing with OutOfMemory trying to create new watcher threads. A typical fail has the following in logs:

           [junit4]   2> 1218223 T4785 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged
             [junit4]   2> 1218223 T4789 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged
             [junit4]   2> 1218223 T4791 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged
             [junit4]   2> 1218223 T4795 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged
             [junit4]   2> 1218223 T4797 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged
             [junit4]   2> 1218222 T4803 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged
             [junit4]   2> 1218222 T3305 oaz.ClientCnxn$EventThread.processEvent ERROR Error while calling watcher  java.lang.OutOfMemoryError: unable to create new native thread
             [junit4]   2> 	at java.lang.Thread.start0(Native Method)
             [junit4]   2> 	at java.lang.Thread.start(Thread.java:714)
             [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
             [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
             [junit4]   2> 	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
             [junit4]   2> 	at org.apache.solr.common.cloud.SolrZkClient$3.process(SolrZkClient.java:201)
             [junit4]   2> 	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
             [junit4]   2> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
             [junit4]   2> 
          

          I see hundreds of LatchChildWatcher.process events and then the node goes out of memory.

          Here are some of the recent fails:
          http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4233/
          https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/592/
          https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/2048/

          Show
          Shalin Shekhar Mangar added a comment - Guys, I think there's something weird happening since this was committed. Many tests such as MultiThreadedOCPTest and ShardSplitTest have been failing with OutOfMemory trying to create new watcher threads. A typical fail has the following in logs: [junit4] 2> 1218223 T4785 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged [junit4] 2> 1218223 T4789 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged [junit4] 2> 1218223 T4791 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged [junit4] 2> 1218223 T4795 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged [junit4] 2> 1218223 T4797 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged [junit4] 2> 1218222 T4803 oasc.DistributedQueue$LatchChildWatcher.process LatchChildWatcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged [junit4] 2> 1218222 T3305 oaz.ClientCnxn$EventThread.processEvent ERROR Error while calling watcher java.lang.OutOfMemoryError: unable to create new native thread [junit4] 2> at java.lang. Thread .start0(Native Method) [junit4] 2> at java.lang. Thread .start( Thread .java:714) [junit4] 2> at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) [junit4] 2> at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) [junit4] 2> at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient$3.process(SolrZkClient.java:201) [junit4] 2> at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) [junit4] 2> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) [junit4] 2> I see hundreds of LatchChildWatcher.process events and then the node goes out of memory. Here are some of the recent fails: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4233/ https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/592/ https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/2048/
          Hide
          Mark Miller added a comment - - edited

          Thanks Shalin, I'll take a look. Looks like the executor isn't working as efficiently as we need.

          Show
          Mark Miller added a comment - - edited Thanks Shalin, I'll take a look. Looks like the executor isn't working as efficiently as we need.
          Hide
          Ramkumar Aiyengar added a comment - - edited

          This is possibly exposing a design issue with OCP. There is a watch being created with each workQueue.peekTopN, even if there are work items present. It could then fail exclusivity check, and then try again, find items and create one more watch..

          ~/builds/lucene-solr/solr/core| less out.log | grep 'Exclusivity check failed' | wc -l
          6960
          

          Obviously with the new code you run into issues because that creates 1000s of threads, but it's a problem to begin with that the code is creating 1000s of watches..

          Show
          Ramkumar Aiyengar added a comment - - edited This is possibly exposing a design issue with OCP. There is a watch being created with each workQueue.peekTopN , even if there are work items present. It could then fail exclusivity check, and then try again, find items and create one more watch.. ~/builds/lucene-solr/solr/core| less out.log | grep 'Exclusivity check failed' | wc -l 6960 Obviously with the new code you run into issues because that creates 1000s of threads, but it's a problem to begin with that the code is creating 1000s of watches..
          Hide
          Ramkumar Aiyengar added a comment -

          Thinking this through further, the problem comes down to the fact that peekTopN leaks (potentially multiple) watches as it creates one with each getChildren and throws it away when it has results or a timeout is reached on an empty queue, and currently there's no way to remove watches from Zk currently (hopefully something to look forward to when 3.5 gets released and ZOOKEEPER-1829 gets in). Put this call in an event loop and you have a big issue.

          I guess we do need timeouts to check leadership every now and then, but peek should at that point have a way to see if it had created a watch before and wait on it instead of looking for children again.

          Show
          Ramkumar Aiyengar added a comment - Thinking this through further, the problem comes down to the fact that peekTopN leaks (potentially multiple) watches as it creates one with each getChildren and throws it away when it has results or a timeout is reached on an empty queue, and currently there's no way to remove watches from Zk currently (hopefully something to look forward to when 3.5 gets released and ZOOKEEPER-1829 gets in). Put this call in an event loop and you have a big issue. I guess we do need timeouts to check leadership every now and then, but peek should at that point have a way to see if it had created a watch before and wait on it instead of looking for children again.
          Hide
          Mark Miller added a comment -

          Thanks for digging Ram - you going to work up a patch in the near term or should I jump in?

          Show
          Mark Miller added a comment - Thanks for digging Ram - you going to work up a patch in the near term or should I jump in?
          Hide
          Ramkumar Aiyengar added a comment -

          I will open a separate Jira and attach a patch. I have one ready..

          Show
          Ramkumar Aiyengar added a comment - I will open a separate Jira and attach a patch. I have one ready..
          Hide
          Shalin Shekhar Mangar added a comment -

          Awesome, thanks for digging in!

          Show
          Shalin Shekhar Mangar added a comment - Awesome, thanks for digging in!
          Hide
          Ramkumar Aiyengar added a comment -

          Opened SOLR-6336.

          Show
          Ramkumar Aiyengar added a comment - Opened SOLR-6336 .
          Hide
          Mark Miller added a comment -

          This image compares thread behavior while running MultiThreadedOCPTest before the commit of this issue, with current trunk, and after SOLR-6336 is applied.

          Show
          Mark Miller added a comment - This image compares thread behavior while running MultiThreadedOCPTest before the commit of this issue, with current trunk, and after SOLR-6336 is applied.
          Hide
          Mark Miller added a comment -

          Thanks Shalin and Ramkumar!

          Show
          Mark Miller added a comment - Thanks Shalin and Ramkumar!
          Hide
          Ramkumar Aiyengar added a comment -

          Mark Miller, had posted a patch for one missed case with the session watcher (just above Shalin's comment), could you commit that as well or should that be a separate issue?

          Show
          Ramkumar Aiyengar added a comment - Mark Miller , had posted a patch for one missed case with the session watcher (just above Shalin's comment), could you commit that as well or should that be a separate issue?
          Hide
          Mark Miller added a comment -

          Ah, confused it with the second patch on the other issue when I glanced at it. Yeah, I can commit it here.

          Show
          Mark Miller added a comment - Ah, confused it with the second patch on the other issue when I glanced at it. Yeah, I can commit it here.
          Hide
          Ramkumar Aiyengar added a comment - - edited

          Ah, confused it with the second patch on the other issue when I glanced at it. Yeah, I can commit it here.

          Hey Mark Miller, just checking since I don't see an automated message here, was this committed? Thanks!

          Show
          Ramkumar Aiyengar added a comment - - edited Ah, confused it with the second patch on the other issue when I glanced at it. Yeah, I can commit it here. Hey Mark Miller , just checking since I don't see an automated message here, was this committed? Thanks!
          Hide
          Mark Miller added a comment -

          Hmm, my fault. Should have reopened the issue then. We will have to make a new JIRA it sounds.

          Show
          Mark Miller added a comment - Hmm, my fault. Should have reopened the issue then. We will have to make a new JIRA it sounds.
          Hide
          Ramkumar Aiyengar added a comment -

          np, I have now opened SOLR-6570 for this, this is for pull #78 on GitHub.

          Show
          Ramkumar Aiyengar added a comment - np, I have now opened SOLR-6570 for this, this is for pull #78 on GitHub.
          Hide
          ASF GitHub Bot added a comment -

          Github user andyetitmoves closed the pull request at:

          https://github.com/apache/lucene-solr/pull/66

          Show
          ASF GitHub Bot added a comment - Github user andyetitmoves closed the pull request at: https://github.com/apache/lucene-solr/pull/66
          Hide
          ASF GitHub Bot added a comment -

          Github user andyetitmoves closed the pull request at:

          https://github.com/apache/lucene-solr/pull/78

          Show
          ASF GitHub Bot added a comment - Github user andyetitmoves closed the pull request at: https://github.com/apache/lucene-solr/pull/78

            People

            • Assignee:
              Mark Miller
              Reporter:
              Ramkumar Aiyengar
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development