Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8744

Overseer operations need more fine grained mutual exclusion

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 5.4.1
    • Fix Version/s: 6.1
    • Component/s: SolrCloud

      Description

      SplitShard creates a mutex over the whole collection, but, in practice, this is a big scaling problem. Multiple split shard operations could happen at the time time, as long as different shards are being split. In practice, those shards often reside on different machines, so there's no I/O bottleneck in those cases, just the mutex in Overseer forcing the operations to be done serially.

      Given that a single split can take many minutes on a large collection, this is a bottleneck at scale.

      Here is the proposed new design

      There are various Collection operations performed at Overseer. They may need exclusive access at various levels. Each operation must define the Access level at which the access is required. Access level is an enum.

      CLUSTER(0)
      COLLECTION(1)
      SHARD(2)
      REPLICA(3)

      The Overseer node maintains a tree of these locks. The lock tree would look as follows. The tree can be created lazily as and when tasks come up.

      Legend: 
      C1, C2 -> Collections
      S1, S2 -> Shards 
      R1,R2,R3,R4 -> Replicas
      
      
                       Cluster
                      /       \
                     /         \         
                    C1          C2
                   / \         /   \     
                  /   \       /     \      
                 S1   S2      S1     S2
              R1, R2  R3.R4  R1,R2   R3,R4
      

      When the overseer receives a message, it tries to acquire the appropriate lock from the tree. For example, if an operation needs a lock at a Collection level and it needs to operate on Collection C1, the node C1 and all child nodes of C1 must be free.

      Lock acquiring logic

      Each operation would start from the root of the tree (Level 0 -> Cluster) and start moving down depending upon the operation. After it reaches the right node, it checks if all the children are free from a lock. If it fails to acquire a lock, it remains in the work queue. A scheduler thread waits for notification from the current set of tasks . Every task would do a notify() on the monitor of the scheduler thread. The thread would start from the head of the queue and check all tasks to see if that task is able to acquire the right lock. If yes, it is executed, if not, the task is left in the work queue.
      When a new task arrives in the work queue, the schedulerthread wakes and just try to schedule that task.

      1. SmileyLockTree.java
        5 kB
        David Smiley
      2. SmileyLockTree.java
        4 kB
        David Smiley
      3. SOLR-8744.patch
        17 kB
        Scott Blum
      4. SOLR-8744.patch
        15 kB
        Noble Paul
      5. SOLR-8744.patch
        15 kB
        Scott Blum
      6. SOLR-8744.patch
        16 kB
        Noble Paul
      7. SOLR-8744.patch
        11 kB
        Noble Paul
      8. SOLR-8744.patch
        39 kB
        Noble Paul
      9. SOLR-8744.patch
        39 kB
        Noble Paul
      10. SOLR-8744.patch
        45 kB
        Noble Paul
      11. SOLR-8744.patch
        42 kB
        Noble Paul

        Issue Links

          Activity

          Hide
          dragonsinth Scott Blum added a comment -

          Design looks good to me.

          One think to talk through, this may not be a problem per se, but I could imagine starvation becoming an issue. If a task is trying to lock the cluster it may never get to run if smaller tasks keep acquiring pieces of the lock tree. This may not be an issue in practice.

          Show
          dragonsinth Scott Blum added a comment - Design looks good to me. One think to talk through, this may not be a problem per se, but I could imagine starvation becoming an issue. If a task is trying to lock the cluster it may never get to run if smaller tasks keep acquiring pieces of the lock tree. This may not be an issue in practice.
          Hide
          noble.paul Noble Paul added a comment - - edited

          good question. Starvation is indeed a problem and we must make the best effort to run tasks on a first come first served basis

          We can address it by making the scheduling itself smarter as follows:

          As the scheduler thread starts with trying to acquire a lock, it could build a tree of its own with nodes marked 'busy' in a local tree for previously unassignable tasks. The local tree would look as follows.

          If the Task T1 needs a lock on Collection C1 and it is unable to acquire a lock, The local tree would look as follows

          Cluster
            /
           /
          C1
          

          When task T2 is taken up, which requires a lock on Cluster -> C1 -> S1 , It would not try to acquire a lock from lock tree because the path Cluster ->C1 is already is marked as 'busy' in the local tree. So T2 would remain in the work queue till T1 is completed.

          Show
          noble.paul Noble Paul added a comment - - edited good question. Starvation is indeed a problem and we must make the best effort to run tasks on a first come first served basis We can address it by making the scheduling itself smarter as follows: As the scheduler thread starts with trying to acquire a lock, it could build a tree of its own with nodes marked 'busy' in a local tree for previously unassignable tasks. The local tree would look as follows. If the Task T1 needs a lock on Collection C1 and it is unable to acquire a lock, The local tree would look as follows Cluster / / C1 When task T2 is taken up, which requires a lock on Cluster -> C1 -> S1 , It would not try to acquire a lock from lock tree because the path Cluster ->C1 is already is marked as 'busy' in the local tree. So T2 would remain in the work queue till T1 is completed.
          Hide
          dragonsinth Scott Blum added a comment -

          Good plan!

          Show
          dragonsinth Scott Blum added a comment - Good plan!
          Hide
          noble.paul Noble Paul added a comment -

          Patch with tests . More testing required. But please review the class LockTree to see the logic implemented

          Show
          noble.paul Noble Paul added a comment - Patch with tests . More testing required. But please review the class LockTree to see the logic implemented
          Hide
          dsmiley David Smiley added a comment -

          I looked over the patch just a little bit but the suggestion I'm about to give is mostly based on the issue description & comments. This is an interesting/fun programming problem. The main thing I don't like about the current design (and this is not a blocker, I'm not vetoing) is that we obtain locks recursively on down to every object of lower depths (ultimately to replicas). Instead, I propose we consider an alternative design (to follow) in which we only obtain a fixed small number of locks, which is for each parent. This lowers the number of locks, and also allows the actual lock nodes to be much more dynamic, being GC'ed when the locks aren't in use.
          Proposal:

          • LockTree maintains a cache/map of string keys to weakly reference-able LockNode. That's right; they'll get GC'ed when not in use. The root node will be strongly referenced apart from the map so it's always there. It's just a string key, with expectation we'll use some separator to denote hierarchy. When looking up a lock, we simply synchronize as obtaining locks will be infrequent. When looking up a Lock, it will recursively look up a parent, creating it first if not there. Getting the key is a simple matter of stringKey.substring(0, stringKey.lastIndexOf('/')) and assuming all keys start with a leading "/" (thus the root key is the empty string).
          • LockNode implements the JDK Lock interface, but only implementing lock() and unlock(). This makes the API easy for consumers – it's a known abstraction for code using this utility.
          • LockNode's fields consist of a parent LockNode and a JDK ReentrantReadWriteLock. The ReentrantReadWriteLock is instantiated with fairness=true.
          • LockNode.lock: call parent.readLock() (see below) then call readWriteLock.writeLock().lock().
          • LockNode.unlock: call readWriteLock.writeLock.unlock(); then parent.readUnlock(); (see below)
          • LockNode.readLock (private method): call readLock on parent first (this is recursive), and then after that call readWriteLock.readLock().lock(). Thus we lock from top-down (from root down) in effect.
          • LockNode.readUnlock (private method): call readWriteLock.readLock() then recursively call parent.readUnlock().

          What do you think?

          Show
          dsmiley David Smiley added a comment - I looked over the patch just a little bit but the suggestion I'm about to give is mostly based on the issue description & comments. This is an interesting/fun programming problem. The main thing I don't like about the current design (and this is not a blocker, I'm not vetoing) is that we obtain locks recursively on down to every object of lower depths (ultimately to replicas). Instead, I propose we consider an alternative design (to follow) in which we only obtain a fixed small number of locks, which is for each parent. This lowers the number of locks, and also allows the actual lock nodes to be much more dynamic, being GC'ed when the locks aren't in use. Proposal: LockTree maintains a cache/map of string keys to weakly reference-able LockNode. That's right; they'll get GC'ed when not in use. The root node will be strongly referenced apart from the map so it's always there. It's just a string key, with expectation we'll use some separator to denote hierarchy. When looking up a lock, we simply synchronize as obtaining locks will be infrequent. When looking up a Lock, it will recursively look up a parent, creating it first if not there. Getting the key is a simple matter of stringKey.substring(0, stringKey.lastIndexOf('/')) and assuming all keys start with a leading "/" (thus the root key is the empty string). LockNode implements the JDK Lock interface, but only implementing lock() and unlock() . This makes the API easy for consumers – it's a known abstraction for code using this utility. LockNode's fields consist of a parent LockNode and a JDK ReentrantReadWriteLock. The ReentrantReadWriteLock is instantiated with fairness=true. LockNode.lock: call parent.readLock() (see below) then call readWriteLock.writeLock().lock(). LockNode.unlock: call readWriteLock.writeLock.unlock(); then parent.readUnlock(); (see below) LockNode.readLock (private method): call readLock on parent first (this is recursive), and then after that call readWriteLock.readLock().lock(). Thus we lock from top-down (from root down) in effect. LockNode.readUnlock (private method): call readWriteLock.readLock() then recursively call parent.readUnlock(). What do you think?
          Hide
          noble.paul Noble Paul added a comment -

          Thanks David Smiley for your comments

          which we only obtain a fixed small number of locks, which is for each parent.

          I miss that . Only one operation should be able to lock at a level.

          lockTree maintains a cache/map of string keys to weakly reference-able LockNode.

          It's an optimisation that could be done. Here we are just talking about a few hundred (may be afew thousand? ) extremely lightweight objects

          Getting the key is a simple matter of stringKey.substring(0, stringKey.lastIndexOf('/'))

          I didn't get that one

          LockNode implements the JDK Lock interface

          Java Lock interface makes me implement too many things. In this, we don't even expose the lock() method. The semantics are much different here. implementing/using java Lock interface sets a wrong set of expectations.

          LockTree is not a general purpose utility. I wrote it as an independent class because it is a complex enough functionality which should be tested independently.

          Show
          noble.paul Noble Paul added a comment - Thanks David Smiley for your comments which we only obtain a fixed small number of locks, which is for each parent. I miss that . Only one operation should be able to lock at a level. lockTree maintains a cache/map of string keys to weakly reference-able LockNode. It's an optimisation that could be done. Here we are just talking about a few hundred (may be afew thousand? ) extremely lightweight objects Getting the key is a simple matter of stringKey.substring(0, stringKey.lastIndexOf('/')) I didn't get that one LockNode implements the JDK Lock interface Java Lock interface makes me implement too many things. In this, we don't even expose the lock() method. The semantics are much different here. implementing/using java Lock interface sets a wrong set of expectations. LockTree is not a general purpose utility. I wrote it as an independent class because it is a complex enough functionality which should be tested independently.
          Hide
          dsmiley David Smiley added a comment -

          The main purpose of my suggestion is simplicity: LockTree (public) and internally LockTree.Node (private) – that's it; no new abstractions, and it'd have very little code. Node would keep track of it's parent only, not children. No Session or LockObject. Returning a JDK Lock helps in that regard (it is an existing abstraction and if we wanted to it'd be easy to support all its methods) but it's not critical – my suggested LockTree.Node could be public and we don't implement Lock if you feel that's better. I disagree with you on the semantics being different vs the same but whatever. It just so happens that my suggestion might yield something general purpose but that's not my objective, it's simplicity. Locking on fewer objects is nice too, but perhaps it won't become a real issue with your proposal. Perhaps I'm not fully appreciating the task at hand – I admit I glossed over the starvation issue and the code here concerned with marking busy/not. I hoped the "fairness" boolean configuration to JDK's ReentrantReadWriteLock would address that but maybe not?

          (me) Getting the key is a simple matter of stringKey.substring(0, stringKey.lastIndexOf('/'))

          Just a minor implementation detail to get a parent key instead of working with a List<String> – nevermind.

          Show
          dsmiley David Smiley added a comment - The main purpose of my suggestion is simplicity: LockTree (public) and internally LockTree.Node (private) – that's it; no new abstractions, and it'd have very little code. Node would keep track of it's parent only, not children. No Session or LockObject. Returning a JDK Lock helps in that regard (it is an existing abstraction and if we wanted to it'd be easy to support all its methods) but it's not critical – my suggested LockTree.Node could be public and we don't implement Lock if you feel that's better. I disagree with you on the semantics being different vs the same but whatever. It just so happens that my suggestion might yield something general purpose but that's not my objective, it's simplicity. Locking on fewer objects is nice too, but perhaps it won't become a real issue with your proposal. Perhaps I'm not fully appreciating the task at hand – I admit I glossed over the starvation issue and the code here concerned with marking busy/not. I hoped the "fairness" boolean configuration to JDK's ReentrantReadWriteLock would address that but maybe not? (me) Getting the key is a simple matter of stringKey.substring(0, stringKey.lastIndexOf('/')) Just a minor implementation detail to get a parent key instead of working with a List<String> – nevermind.
          Hide
          dragonsinth Scott Blum added a comment -

          Digging into the LockTree now to understand it better. Some preliminary notes:

          • In OverseerTaskProcessor, handing off the lock object (up to and including the tpe.execute call) should probably move inside the try block; on successfully executing the task, null out the lock local, and then put an "if lock != null unlock" into a finally block.
          • Do we need some kind of complete reset in the event stuff really blows up? Should there be a session-level clear that just unlocks everything?
          • I think MigrateStateFormat could be collection level rather than cluster level?
          Show
          dragonsinth Scott Blum added a comment - Digging into the LockTree now to understand it better. Some preliminary notes: In OverseerTaskProcessor, handing off the lock object (up to and including the tpe.execute call) should probably move inside the try block; on successfully executing the task, null out the lock local, and then put an "if lock != null unlock" into a finally block. Do we need some kind of complete reset in the event stuff really blows up? Should there be a session-level clear that just unlocks everything? I think MigrateStateFormat could be collection level rather than cluster level?
          Hide
          noble.paul Noble Paul added a comment -

          In OverseerTaskProcessor, handing off the lock object (up to and including the tpe.execute call) should probably move inside the try block; on successfully executing the task, null out the lock local, and then put an "if lock != null unlock" into a finally block.

          The runner is always executed in a different thread. So the finally block will always evaluate lock != null to true. Actually, only in case of an exception, OverseerTaskProcessor should unlock it

          Do we need some kind of complete reset in the event stuff really blows up? Should there be a session-level clear that just unlocks everything?

          Yes, we need it, but not at the session level. If I clear up everything for each session, it defeats the purpose. The session is valid for only one batch. The locks should survive for multiple batches.

          A better solution is to periodically check if the running tasks are empty and if yes, just create a new LockTree. I was planning t do it anyway

          I think MigrateStateFormat could be collection level rather than cluster level?

          I have set the lock levels more pessimistically. We should re-evaluate them

          Show
          noble.paul Noble Paul added a comment - In OverseerTaskProcessor, handing off the lock object (up to and including the tpe.execute call) should probably move inside the try block; on successfully executing the task, null out the lock local, and then put an "if lock != null unlock" into a finally block. The runner is always executed in a different thread. So the finally block will always evaluate lock != null to true. Actually, only in case of an exception, OverseerTaskProcessor should unlock it Do we need some kind of complete reset in the event stuff really blows up? Should there be a session-level clear that just unlocks everything? Yes, we need it, but not at the session level. If I clear up everything for each session, it defeats the purpose. The session is valid for only one batch. The locks should survive for multiple batches. A better solution is to periodically check if the running tasks are empty and if yes, just create a new LockTree . I was planning t do it anyway I think MigrateStateFormat could be collection level rather than cluster level? I have set the lock levels more pessimistically. We should re-evaluate them
          Hide
          noble.paul Noble Paul added a comment -

          Node would keep track of its parent only, not children.

          I do not know how to traverse down a tree without keeping track of the children.

          No Session or LockObject.

          The session is created for a purpose. It is to abstract out the starvation logic.

          I'm not sure I understand your design. Maybe an alternate patch with an implementation of LockTree would help

          Show
          noble.paul Noble Paul added a comment - Node would keep track of its parent only, not children. I do not know how to traverse down a tree without keeping track of the children. No Session or LockObject. The session is created for a purpose. It is to abstract out the starvation logic. I'm not sure I understand your design. Maybe an alternate patch with an implementation of LockTree would help
          Hide
          dsmiley David Smiley added a comment -

          Here's SmileyLockTree.java (name is obviously just to not confuse it with your LockTree). I ditched the weak values idea as I had doubts on the added complexity. I changed the impl a bit from what I had said to make it feel more elegant to me, and I implemented most of JDK Lock for fun. After a short discussion with Scott Blum on IRC on the context of this task, I believe there's no issue of starvation (for SmileyLockTree specifically any way) since the Overseer runs sequentially so I didn't add the fairness=true option.

          Show
          dsmiley David Smiley added a comment - Here's SmileyLockTree.java (name is obviously just to not confuse it with your LockTree). I ditched the weak values idea as I had doubts on the added complexity. I changed the impl a bit from what I had said to make it feel more elegant to me, and I implemented most of JDK Lock for fun. After a short discussion with Scott Blum on IRC on the context of this task, I believe there's no issue of starvation (for SmileyLockTree specifically any way) since the Overseer runs sequentially so I didn't add the fairness=true option.
          Hide
          noble.paul Noble Paul added a comment -

          Thanks David Smiley

          The implementation is wrong

          It's not enough to ensure that your parent is not locked. It is also required that none of your children are locked too. For example, C1 cannot be locked if C1/S1/R1 or C1/S2/R1 is locked

          The flat map and substring based approach is expensive in terms of resources. Every substring() call creates a new String. It does not have to be that expensive when you can achieve the same with a map lookup. It may look simple, but I would say it's inefficient and lazy.

          Using the java Lock is contrived because we don't need any of that functionality. We just need a mark() unmark() functionality. There are no real locks required here.

          Show
          noble.paul Noble Paul added a comment - Thanks David Smiley The implementation is wrong It's not enough to ensure that your parent is not locked. It is also required that none of your children are locked too. For example, C1 cannot be locked if C1/S1/R1 or C1/S2/R1 is locked The flat map and substring based approach is expensive in terms of resources. Every substring() call creates a new String. It does not have to be that expensive when you can achieve the same with a map lookup. It may look simple, but I would say it's inefficient and lazy. Using the java Lock is contrived because we don't need any of that functionality. We just need a mark() unmark() functionality. There are no real locks required here.
          Hide
          noble.paul Noble Paul added a comment -

          The locktree is reset every time the runningTasks is empty. So there is no chance of a blow up.

          Show
          noble.paul Noble Paul added a comment - The locktree is reset every time the runningTasks is empty. So there is no chance of a blow up.
          Hide
          dsmiley David Smiley added a comment -

          Thanks for the review Noble Paul. I attached a new version that uses List<String> instead of '/' separated strings. It's even fewer lines of code

          It's not enough to ensure that your parent is not locked. It is also required that none of your children are locked too. For example, C1 cannot be locked if C1/S1/R1 or C1/S2/R1 is locked

          Did you see the javadocs? I could have added more, and I added a little more. Or maybe you are unfamiliar with how ReentrantReadWriteLock works? See, there are two locks at each node in the tree – a read lock and a write lock, courtesy of ReentrantReadWriteLock which does most of the work here. Multiple read locks can be open for a ReentrantReadWriteLock but the write lock is mutually exclusive to them all. The Lock that SmileyLockTree returns at a given "node" will first obtain a read lock at all ancestors (top down) and then obtain a write lock on the node for itself. If a Lock for C1/S1/R1 is obtained, it will be a LayeredLock with "thisLock" being the write lock for itself (R1) and the "parentLock" is a chain of LeveledLock of read locks to its ancestors. The origin of the chain (i.e. if you keep following LevelledLock parentLock instances) is the read lock on the root, which is a direct instance of ReentrantReadWriteLock.readLock whereas all the rest are the anonymous inner class ReadWriteLock that returns a LayeredLock for it's read & write sides. So when the caller calls lock(), the first concrete Lock impl to be locked with be the read Lock on root (empty string), then the read lock on C1 then the read lock on C1/S1 then the write lock on C1/S1/R1. For the duration of time this is locked, any siblings of R1 or R1's ancestors may be locked, or they could have been locked before. What would block would be a an ancestor – C1 in your example would block because the write lock blocks for read locks, and the read block is locked by not only C1/S1/R1 but also C1/S2/R1 in your example.

          It's a little functional which maybe you aren't used to, and it leverages useful abstractions and implementations from the JDK that maybe you are unfamiliar with, so perhaps that's why it's not clear how this works. Do you grok this Scott Blum?

          Show
          dsmiley David Smiley added a comment - Thanks for the review Noble Paul . I attached a new version that uses List<String> instead of '/' separated strings. It's even fewer lines of code It's not enough to ensure that your parent is not locked. It is also required that none of your children are locked too. For example, C1 cannot be locked if C1/S1/R1 or C1/S2/R1 is locked Did you see the javadocs? I could have added more, and I added a little more. Or maybe you are unfamiliar with how ReentrantReadWriteLock works? See, there are two locks at each node in the tree – a read lock and a write lock, courtesy of ReentrantReadWriteLock which does most of the work here. Multiple read locks can be open for a ReentrantReadWriteLock but the write lock is mutually exclusive to them all. The Lock that SmileyLockTree returns at a given "node" will first obtain a read lock at all ancestors (top down) and then obtain a write lock on the node for itself. If a Lock for C1/S1/R1 is obtained, it will be a LayeredLock with "thisLock" being the write lock for itself (R1) and the "parentLock" is a chain of LeveledLock of read locks to its ancestors. The origin of the chain (i.e. if you keep following LevelledLock parentLock instances) is the read lock on the root, which is a direct instance of ReentrantReadWriteLock.readLock whereas all the rest are the anonymous inner class ReadWriteLock that returns a LayeredLock for it's read & write sides. So when the caller calls lock(), the first concrete Lock impl to be locked with be the read Lock on root (empty string), then the read lock on C1 then the read lock on C1/S1 then the write lock on C1/S1/R1 . For the duration of time this is locked, any siblings of R1 or R1's ancestors may be locked, or they could have been locked before. What would block would be a an ancestor – C1 in your example would block because the write lock blocks for read locks, and the read block is locked by not only C1/S1/R1 but also C1/S2/R1 in your example. It's a little functional which maybe you aren't used to, and it leverages useful abstractions and implementations from the JDK that maybe you are unfamiliar with, so perhaps that's why it's not clear how this works. Do you grok this Scott Blum ?
          Hide
          dragonsinth Scott Blum added a comment -

          Have you run your impl against the test Noble Paul wrote? Curious if it passes.

          I think I grok it, but I found a few problems on inspection:

          • LayeredLock.tryLock() doesn't really implement markBusy() because it doesn't retain readLocks on the parent chain if you fail to lock the child. To implement markBusy you'd need to leave them locked (and have a way to later completely reset all the read locks).
          • I still think implementing the j.u.c.L interface is silly here, tryLock() is the only operation that makes sense in context.
          • You can move the synchronized(locks) from the private recursive method into the public one and acquire it once.
          • You can move the ArrayList copy from the private recursive method into the public one and only make a single copy that you later take sublists of.
          • Just as a note: you'd need to occasionally throw away the entire tree as it only grows and never shrinks.
          Show
          dragonsinth Scott Blum added a comment - Have you run your impl against the test Noble Paul wrote? Curious if it passes. I think I grok it, but I found a few problems on inspection: LayeredLock.tryLock() doesn't really implement markBusy() because it doesn't retain readLocks on the parent chain if you fail to lock the child. To implement markBusy you'd need to leave them locked (and have a way to later completely reset all the read locks). I still think implementing the j.u.c.L interface is silly here, tryLock() is the only operation that makes sense in context. You can move the synchronized(locks) from the private recursive method into the public one and acquire it once. You can move the ArrayList copy from the private recursive method into the public one and only make a single copy that you later take sublists of. Just as a note: you'd need to occasionally throw away the entire tree as it only grows and never shrinks.
          Hide
          noble.paul Noble Paul added a comment -

          David , It works. But it is not yet complete. But, I miss the point. What are we trying to solve? Is the current implementation buggy? We just need one correct implementation. I have no reason to reimplement everything unless there is something wrong with the current implementation. I'm mostly done with the current patch and I plan to commit this as soon as I clean up the cruft.

          Show
          noble.paul Noble Paul added a comment - David , It works. But it is not yet complete. But, I miss the point. What are we trying to solve? Is the current implementation buggy? We just need one correct implementation. I have no reason to reimplement everything unless there is something wrong with the current implementation. I'm mostly done with the current patch and I plan to commit this as soon as I clean up the cruft.
          Hide
          dsmiley David Smiley added a comment -

          (Noble) David , It works. But it is not yet complete. But, I miss the point. What are we trying to solve? Is the current implementation buggy? We just need one correct implementation.

          I am merely offering an alternative implementation to part of the task that I feel is more simple / elegant (as I stated). In part I'm doing this because it's fun/interesting Other than complexity of it and related code already in OverseerTaskProcessor, I'm not sure what bugs may or may not be in your patch or the existing code.

          (Scott) Have you run your impl against the test Noble Paul wrote? Curious if it passes.

          No; I'd like to do that. I suggest Noble Paul commit to a branch, push, and I commit on top (assuming tests pass). Of course it will be reverted if you all don't like it.

          (Scott) LayeredLock.tryLock() doesn't really implement markBusy() because it doesn't retain readLocks on the parent chain if you fail to lock the child. To implement markBusy you'd need to leave them locked (and have a way to later completely reset all the read locks).

          Thanks for clarifying on IRC why the Overseer would want to use tryLock() instead of lock() – something that confused me from your question/statement. I had thought OverseerTaskProcessor was going to spawn off a task/thread (more likely Executor impl) that would simply call lock() in its own thread and it may wait as long as needed to acquire the lock and to do its task. I am quite ready to admit I don't know what's actually going on here ;-P which is apparently the case. Let me ask you this then... why doesn't it work in this manner? It sounds simple enough. I do see an executor (tpe field), but there's so much surrounding code that it is confusing my ability to grok it and thus why tryLock vs lock might be wanted. In such a design (in my simple conceptual model), fairness=true should be on the ReentrantReadWriteLocks in SmileyLockTree to prevent starvation to get FIFO behavior on the write lock. If it's too complicated to explain, maybe let me have at it on a branch. And again, I may very well not appreciate (yet) why OverseerTaskProcessor with it's proposed LockTree needs to be so complicated. So please don't take any of my questions or suggestions as a slight.

          Show
          dsmiley David Smiley added a comment - (Noble) David , It works. But it is not yet complete. But, I miss the point. What are we trying to solve? Is the current implementation buggy? We just need one correct implementation. I am merely offering an alternative implementation to part of the task that I feel is more simple / elegant (as I stated). In part I'm doing this because it's fun/interesting Other than complexity of it and related code already in OverseerTaskProcessor, I'm not sure what bugs may or may not be in your patch or the existing code. (Scott) Have you run your impl against the test Noble Paul wrote? Curious if it passes. No; I'd like to do that. I suggest Noble Paul commit to a branch, push, and I commit on top (assuming tests pass). Of course it will be reverted if you all don't like it. (Scott) LayeredLock.tryLock() doesn't really implement markBusy() because it doesn't retain readLocks on the parent chain if you fail to lock the child. To implement markBusy you'd need to leave them locked (and have a way to later completely reset all the read locks). Thanks for clarifying on IRC why the Overseer would want to use tryLock() instead of lock() – something that confused me from your question/statement. I had thought OverseerTaskProcessor was going to spawn off a task/thread (more likely Executor impl) that would simply call lock() in its own thread and it may wait as long as needed to acquire the lock and to do its task. I am quite ready to admit I don't know what's actually going on here ;-P which is apparently the case. Let me ask you this then... why doesn't it work in this manner? It sounds simple enough. I do see an executor (tpe field), but there's so much surrounding code that it is confusing my ability to grok it and thus why tryLock vs lock might be wanted. In such a design (in my simple conceptual model), fairness=true should be on the ReentrantReadWriteLocks in SmileyLockTree to prevent starvation to get FIFO behavior on the write lock. If it's too complicated to explain, maybe let me have at it on a branch. And again, I may very well not appreciate (yet) why OverseerTaskProcessor with it's proposed LockTree needs to be so complicated. So please don't take any of my questions or suggestions as a slight.
          Hide
          dsmiley David Smiley added a comment -

          Maybe my ideas to simplify should really be its own issue because there's apparently more to it than the locking – it's interrelated with OverseerTaskProcessor itself. I think the SmileyLockTree.java could be part of a solution (assuming I'm not missing requirements overall by the OTP) but by itself with OTP's present design, it is perhaps not enough / not right.

          Show
          dsmiley David Smiley added a comment - Maybe my ideas to simplify should really be its own issue because there's apparently more to it than the locking – it's interrelated with OverseerTaskProcessor itself. I think the SmileyLockTree.java could be part of a solution (assuming I'm not missing requirements overall by the OTP) but by itself with OTP's present design, it is perhaps not enough / not right.
          Hide
          dragonsinth Scott Blum added a comment - - edited

          Noble Paul One more comment, instead of making LockTree.LockObject public, I think you could just have LockObject implement the Lock interface and return the interface. Then you don't have to do the "return lock == null ? null : lock::unlock" weirdness, and the FREELOCK sentinel can be a dummy interface impl instead of a concrete object.

          Show
          dragonsinth Scott Blum added a comment - - edited Noble Paul One more comment, instead of making LockTree.LockObject public, I think you could just have LockObject implement the Lock interface and return the interface. Then you don't have to do the "return lock == null ? null : lock::unlock" weirdness, and the FREELOCK sentinel can be a dummy interface impl instead of a concrete object.
          Hide
          dragonsinth Scott Blum added a comment -

          Or make "OverseerLock" a top-level interface type in the package.

          Show
          dragonsinth Scott Blum added a comment - Or make "OverseerLock" a top-level interface type in the package.
          Hide
          dragonsinth Scott Blum added a comment -

          Noble Paul One more big question for me: why does the LockTree need a clusterStateProvider? I don't follow the need to call findChildren in order to prepopulate locks at a given level. In many cases, you might need to lock something that doesn't really even exist. For example, create collection or create shard or create replica should acquire the lock for the thing they are about to try to create, even if the thing doesn't exist yet. This code:

                  if (child == null) {
                    LOG.info("locktree_Unable to get path for " + currentAction + " path " + path);
                    return FREELOCK;//no such entity . So no need to lock
                  }
          

          It seems to me that this basically isn't true. I don't think the LockTree needs knowledge of the actual elements in the real cluster state tree? I'm also not sure the Level actually matters beyond OCMH.lockTask, where we really just need to translate the lock level into the right-length path. Does that make any sense?

          Show
          dragonsinth Scott Blum added a comment - Noble Paul One more big question for me: why does the LockTree need a clusterStateProvider? I don't follow the need to call findChildren in order to prepopulate locks at a given level. In many cases, you might need to lock something that doesn't really even exist. For example, create collection or create shard or create replica should acquire the lock for the thing they are about to try to create, even if the thing doesn't exist yet. This code: if (child == null ) { LOG.info( "locktree_Unable to get path for " + currentAction + " path " + path); return FREELOCK; //no such entity . So no need to lock } It seems to me that this basically isn't true. I don't think the LockTree needs knowledge of the actual elements in the real cluster state tree? I'm also not sure the Level actually matters beyond OCMH.lockTask, where we really just need to translate the lock level into the right-length path. Does that make any sense?
          Hide
          dragonsinth Scott Blum added a comment -

          Relatedly, lockTask should probably ensure that the appropriate number of path elements are non-nil for the given lock level. E.g. a shard lock level should ensure that message.getStr(ZkStateReader.SHARD_ID_PROP) is non-null.

          Show
          dragonsinth Scott Blum added a comment - Relatedly, lockTask should probably ensure that the appropriate number of path elements are non-nil for the given lock level. E.g. a shard lock level should ensure that message.getStr(ZkStateReader.SHARD_ID_PROP) is non-null.
          Hide
          noble.paul Noble Paul added a comment -

          One more big question for me: why does the LockTree need a clusterStateProvider?

          Exactly. I could create a lock blindly and don't bother whether that entity actually exists.

          One more comment, instead of making LockTree.LockObject public

          True

          Show
          noble.paul Noble Paul added a comment - One more big question for me: why does the LockTree need a clusterStateProvider? Exactly. I could create a lock blindly and don't bother whether that entity actually exists. One more comment, instead of making LockTree.LockObject public True
          Hide
          noble.paul Noble Paul added a comment -

          Implements your suggestions Scott Blum

          Show
          noble.paul Noble Paul added a comment - Implements your suggestions Scott Blum
          Hide
          dragonsinth Scott Blum added a comment -

          Good stuff, new LockTree looking good.

          When does TaskBatch.batchId get assigned? It looks like it's always 0 right now.

          Show
          dragonsinth Scott Blum added a comment - Good stuff, new LockTree looking good. When does TaskBatch.batchId get assigned? It looks like it's always 0 right now.
          Hide
          noble.paul Noble Paul added a comment -

          I missed those changes while merging between branches. Added back in this patch. I'll do another review to see what else is lost

          Show
          noble.paul Noble Paul added a comment - I missed those changes while merging between branches. Added back in this patch. I'll do another review to see what else is lost
          Hide
          dragonsinth Scott Blum added a comment -

          Hi, any update on this! Seemed like it was really close.

          Show
          dragonsinth Scott Blum added a comment - Hi, any update on this! Seemed like it was really close.
          Hide
          noble.paul Noble Paul added a comment -

          I'm planning to commit this soon

          Show
          noble.paul Noble Paul added a comment - I'm planning to commit this soon
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 459a9c77a6a9b807deb98c58225d4d0ec1f75bac in lucene-solr's branch refs/heads/master from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=459a9c7 ]

          SOLR-8744: Overseer operations performed with fine grained mutual exclusion

          Show
          jira-bot ASF subversion and git services added a comment - Commit 459a9c77a6a9b807deb98c58225d4d0ec1f75bac in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=459a9c7 ] SOLR-8744 : Overseer operations performed with fine grained mutual exclusion
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 734dcb99fcdd557ef046166193ae804824023db3 in lucene-solr's branch refs/heads/branch_6x from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=734dcb9 ]

          SOLR-8744: Overseer operations performed with fine grained mutual exclusion

          Show
          jira-bot ASF subversion and git services added a comment - Commit 734dcb99fcdd557ef046166193ae804824023db3 in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=734dcb9 ] SOLR-8744 : Overseer operations performed with fine grained mutual exclusion
          Hide
          noble.paul Noble Paul added a comment - - edited

          We have another problem because of tasks which cannot be run because they can't acquire a lock.

          imagine we have around 100 items (this is the current limit) in the work-queue. New tasks are added to the work queue. The call to peekTopN() would return only the same set of 100 items. Newly added items do not get fetched. This leads to 2 problems

          1. New tasks which could be processed, do not get a chance
          2. ZK is unnecessarily hammered with requests to fetches the same set of items again and again.
          Show
          noble.paul Noble Paul added a comment - - edited We have another problem because of tasks which cannot be run because they can't acquire a lock. imagine we have around 100 items (this is the current limit) in the work-queue. New tasks are added to the work queue. The call to peekTopN() would return only the same set of 100 items. Newly added items do not get fetched. This leads to 2 problems New tasks which could be processed, do not get a chance ZK is unnecessarily hammered with requests to fetches the same set of items again and again.
          Hide
          noble.paul Noble Paul added a comment -

          This keeps a local copy of clocked tasks and uses that in the excluded tasks check

          Show
          noble.paul Noble Paul added a comment - This keeps a local copy of clocked tasks and uses that in the excluded tasks check
          Hide
          dragonsinth Scott Blum added a comment -

          Some quick feedback:

          • I think we need a cap on the total size of blockedTasks; otherwise in the face of a cluster level task, you'll read everything in ZK into memory continually and grow blockedTasks without bound.

          1) "while (runningTasks.size() > MAX_PARALLEL_TASKS)

          { ...wait... }

          " should account for blockedTasks.size() as well, or else maybe there should be a separate sleep check on blockedTasks being too full

          2) "workQueue.peekTopN(MAX_PARALLEL_TASKS" could also peek for a smaller number of tasks when blockedTasks() is almost full. Something like:

          workQueue.peekTopN(Math.min(MAX_PARALLEL_TASKS - runningTasks.size(), MAX_BLOCKED_TASKS - blockedTasks.size(), ...

          • excludedTasks could be a Sets.union() of the other two if you want to automatically retain things like toString() and size() instead of a Function
          Show
          dragonsinth Scott Blum added a comment - Some quick feedback: I think we need a cap on the total size of blockedTasks; otherwise in the face of a cluster level task, you'll read everything in ZK into memory continually and grow blockedTasks without bound. 1) "while (runningTasks.size() > MAX_PARALLEL_TASKS) { ...wait... } " should account for blockedTasks.size() as well, or else maybe there should be a separate sleep check on blockedTasks being too full 2) "workQueue.peekTopN(MAX_PARALLEL_TASKS" could also peek for a smaller number of tasks when blockedTasks() is almost full. Something like: workQueue.peekTopN(Math.min(MAX_PARALLEL_TASKS - runningTasks.size(), MAX_BLOCKED_TASKS - blockedTasks.size(), ... excludedTasks could be a Sets.union() of the other two if you want to automatically retain things like toString() and size() instead of a Function
          Hide
          noble.paul Noble Paul added a comment -

          think we need a cap on the total size of blockedTasks; otherwise in the face of a cluster level task, you'll read everything in ZK into memory continually and grow blockedTasks without bound.

          It's a tricky situation. It is very cheap to hold a few 1000 tasks in memory (or even 10000s) . While we have limited no:of threads to process. It still makes sense to read newer tasks hoping that they will be able to process without getting blocked. I shall address these issues (as much as possible) and post a fresh patch.

          Show
          noble.paul Noble Paul added a comment - think we need a cap on the total size of blockedTasks; otherwise in the face of a cluster level task, you'll read everything in ZK into memory continually and grow blockedTasks without bound. It's a tricky situation. It is very cheap to hold a few 1000 tasks in memory (or even 10000s) . While we have limited no:of threads to process. It still makes sense to read newer tasks hoping that they will be able to process without getting blocked. I shall address these issues (as much as possible) and post a fresh patch.
          Hide
          dragonsinth Scott Blum added a comment -

          Yeah 1000 seems totally reasonable. Beyond that, it seems kind of pointless; if you have 1000 tasks blocked, chances are that subsequent tasks aren't going to be able to run either.

          Show
          dragonsinth Scott Blum added a comment - Yeah 1000 seems totally reasonable. Beyond that, it seems kind of pointless; if you have 1000 tasks blocked, chances are that subsequent tasks aren't going to be able to run either.
          Hide
          dragonsinth Scott Blum added a comment -

          btw: mind if I land SOLR-9191 first? I will potentially need to backport that change to a bunch of branches which don't have the LockTree stuff, so it'll be a little cleaner port if I land first.

          Show
          dragonsinth Scott Blum added a comment - btw: mind if I land SOLR-9191 first? I will potentially need to backport that change to a bunch of branches which don't have the LockTree stuff, so it'll be a little cleaner port if I land first.
          Hide
          noble.paul Noble Paul added a comment -

          Sure Scott. Let's commit that first. Let's make these two a blocker for 6.1.0 release.

          Show
          noble.paul Noble Paul added a comment - Sure Scott. Let's commit that first. Let's make these two a blocker for 6.1.0 release.
          Hide
          noble.paul Noble Paul added a comment -

          Scott Blum This takes care of your concerns

          Show
          noble.paul Noble Paul added a comment - Scott Blum This takes care of your concerns
          Hide
          dragonsinth Scott Blum added a comment -

          Mostly LG. One completely minor comment:

          +              if (blockedTasks.size() < MAX_BLOCKED_TASKS) {
          +                blockedTasks.put(head.getId(), head);
          +              }
          

          Maybe unnecessary? It doesn't actually matter if blockedTasks.size() gets to 1100 items, the check at the top of the loop will keep it from growing endlessly. If you drop these tasks on the floor, you'll just end up re-fetching the bytes from ZK later.

          Show
          dragonsinth Scott Blum added a comment - Mostly LG. One completely minor comment: + if (blockedTasks.size() < MAX_BLOCKED_TASKS) { + blockedTasks.put(head.getId(), head); + } Maybe unnecessary? It doesn't actually matter if blockedTasks.size() gets to 1100 items, the check at the top of the loop will keep it from growing endlessly. If you drop these tasks on the floor, you'll just end up re-fetching the bytes from ZK later.
          Hide
          dragonsinth Scott Blum added a comment -

          BTW, I landed SOLR-9191 in master and 6x, so you should be good to go on that front.

          Show
          dragonsinth Scott Blum added a comment - BTW, I landed SOLR-9191 in master and 6x, so you should be good to go on that front.
          Hide
          dragonsinth Scott Blum added a comment -

          I got one test failure patching this into master:

          MultiThreadedOCPTest fails reliably:

          java.lang.AssertionError
          at __randomizedtesting.SeedInfo.seed([7F3C69CF3DAFFDD5:F76856159353902D]:0)
          at org.junit.Assert.fail(Assert.java:92)
          at org.junit.Assert.assertTrue(Assert.java:43)
          at org.junit.Assert.assertTrue(Assert.java:54)
          at org.apache.solr.cloud.MultiThreadedOCPTest.testFillWorkQueue(MultiThreadedOCPTest.java:106)

          Show
          dragonsinth Scott Blum added a comment - I got one test failure patching this into master: MultiThreadedOCPTest fails reliably: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed( [7F3C69CF3DAFFDD5:F76856159353902D] :0) at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.solr.cloud.MultiThreadedOCPTest.testFillWorkQueue(MultiThreadedOCPTest.java:106)
          Hide
          dragonsinth Scott Blum added a comment -

          Actually, I may have a fix. You need a Thread.sleep() in the final loop or you can burn through 500 iterations on a fast machine. Here's a patch.

          Show
          dragonsinth Scott Blum added a comment - Actually, I may have a fix. You need a Thread.sleep() in the final loop or you can burn through 500 iterations on a fast machine. Here's a patch.
          Hide
          dragonsinth Scott Blum added a comment -

          One other comment:

                      // We are breaking out if we already have reached the no:of parallel tasks running
                      // By doing so we may end up discarding the old list of blocked tasks . But we have
                      // no means to know if they would still be blocked after some of the items ahead
                      // were cleared.
                      if (runningTasks.size() >= MAX_PARALLEL_TASKS) break;
          

          In this case, can't we just shove all the remaining tasks into blockedTasks? It's a slight redefinition to mean either "tasks that are blocked because of locks" or "tasks blocked because too many are running".

          Show
          dragonsinth Scott Blum added a comment - One other comment: // We are breaking out if we already have reached the no:of parallel tasks running // By doing so we may end up discarding the old list of blocked tasks . But we have // no means to know if they would still be blocked after some of the items ahead // were cleared. if (runningTasks.size() >= MAX_PARALLEL_TASKS) break ; In this case, can't we just shove all the remaining tasks into blockedTasks? It's a slight redefinition to mean either "tasks that are blocked because of locks" or "tasks blocked because too many are running".
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user dragonsinth opened a pull request:

          https://github.com/apache/lucene-solr/pull/42

          SOLR-8744 blockedTasks

          @noblepaul

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/fullstorydev/lucene-solr SOLR-8744

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/42.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #42


          commit 668d426633fe5551b24ab38036e14c15e7ed4cdf
          Author: Scott Blum <dragonsinth@apache.org>
          Date: 2016-06-09T21:28:07Z

          WIP: SOLR-8744 blockedTasks

          commit 4b2512abcc5e55c653c923eba9762988cf1faae8
          Author: Scott Blum <dragonsinth@apache.org>
          Date: 2016-06-09T22:11:26Z

          Simplifications


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user dragonsinth opened a pull request: https://github.com/apache/lucene-solr/pull/42 SOLR-8744 blockedTasks @noblepaul You can merge this pull request into a Git repository by running: $ git pull https://github.com/fullstorydev/lucene-solr SOLR-8744 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/42.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #42 commit 668d426633fe5551b24ab38036e14c15e7ed4cdf Author: Scott Blum <dragonsinth@apache.org> Date: 2016-06-09T21:28:07Z WIP: SOLR-8744 blockedTasks commit 4b2512abcc5e55c653c923eba9762988cf1faae8 Author: Scott Blum <dragonsinth@apache.org> Date: 2016-06-09T22:11:26Z Simplifications
          Hide
          dragonsinth Scott Blum added a comment -
          Show
          dragonsinth Scott Blum added a comment - See second commit in https://github.com/apache/lucene-solr/pull/42
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dragonsinth commented on the issue:

          https://github.com/apache/lucene-solr/pull/42

          FYI: both of these SHAs passed the test suite

          Show
          githubbot ASF GitHub Bot added a comment - Github user dragonsinth commented on the issue: https://github.com/apache/lucene-solr/pull/42 FYI: both of these SHAs passed the test suite
          Hide
          noble.paul Noble Paul added a comment - - edited

          Yes, but I'm not sure if it's worth that optimization

          Show
          noble.paul Noble Paul added a comment - - edited Yes, but I'm not sure if it's worth that optimization
          Hide
          noble.paul Noble Paul added a comment -

          I will have to beast this test and see if I can reproduce this

          Show
          noble.paul Noble Paul added a comment - I will have to beast this test and see if I can reproduce this
          Hide
          noble.paul Noble Paul added a comment -

          That is true, they may not be necessarily blocked, They are essentially items read from work queue which we don't want to refetch

          Show
          noble.paul Noble Paul added a comment - That is true, they may not be necessarily blocked, They are essentially items read from work queue which we don't want to refetch
          Hide
          noble.paul Noble Paul added a comment -

          incorporated the changes. Beasted the test and no failures

          Show
          noble.paul Noble Paul added a comment - incorporated the changes. Beasted the test and no failures
          Hide
          dragonsinth Scott Blum added a comment -

          LG. I only have one suggestion left, to formulate the "fetch" section like this:

                    ArrayList<QueueEvent> heads = new ArrayList<>(blockedTasks.size() + MAX_PARALLEL_TASKS);
                    heads.addAll(blockedTasks.values());
          
                    // If we have enough items in the blocked tasks already, it makes
                    // no sense to read more items from the work queue. it makes sense
                    // to clear out at least a few items in the queue before we read more items
                    if (heads.size() < MAX_BLOCKED_TASKS) {
                      //instead of reading MAX_PARALLEL_TASKS items always, we should only fetch as much as we can execute
                      int toFetch = Math.min(MAX_BLOCKED_TASKS - heads.size(), MAX_PARALLEL_TASKS - runningTasks.size());
                      List<QueueEvent> newTasks = workQueue.peekTopN(toFetch, excludedTasks, 2000L);
                      log.debug("Got {} tasks from work-queue : [{}]", newTasks.size(), newTasks.toString());
                      heads.addAll(newTasks);
                    } else {
                      Thread.sleep(1000);
                    }
          
                    if (isClosed) break;
          
                    if (heads.isEmpty()) {
                      continue;
                    }
          
                    blockedTasks.clear(); // clear it now; may get refilled below.
          

          This prevents two problems:

          1) The log.debug message "Got {} tasks from work-queue" won't keep reporting blockedTasks as if they were freshly fetched.
          2) When the blockedTask map gets completely full, the Thread.sleep() will prevent a free-spin (the only other real pause in the loop is peekTopN)

          Show
          dragonsinth Scott Blum added a comment - LG. I only have one suggestion left, to formulate the "fetch" section like this: ArrayList<QueueEvent> heads = new ArrayList<>(blockedTasks.size() + MAX_PARALLEL_TASKS); heads.addAll(blockedTasks.values()); // If we have enough items in the blocked tasks already, it makes // no sense to read more items from the work queue. it makes sense // to clear out at least a few items in the queue before we read more items if (heads.size() < MAX_BLOCKED_TASKS) { //instead of reading MAX_PARALLEL_TASKS items always, we should only fetch as much as we can execute int toFetch = Math .min(MAX_BLOCKED_TASKS - heads.size(), MAX_PARALLEL_TASKS - runningTasks.size()); List<QueueEvent> newTasks = workQueue.peekTopN(toFetch, excludedTasks, 2000L); log.debug( "Got {} tasks from work-queue : [{}]" , newTasks.size(), newTasks.toString()); heads.addAll(newTasks); } else { Thread .sleep(1000); } if (isClosed) break ; if (heads.isEmpty()) { continue ; } blockedTasks.clear(); // clear it now; may get refilled below. This prevents two problems: 1) The log.debug message "Got {} tasks from work-queue" won't keep reporting blockedTasks as if they were freshly fetched. 2) When the blockedTask map gets completely full, the Thread.sleep() will prevent a free-spin (the only other real pause in the loop is peekTopN)
          Hide
          dragonsinth Scott Blum added a comment -

          Also, add blockedTasks to printTrackingMaps()?

          Show
          dragonsinth Scott Blum added a comment - Also, add blockedTasks to printTrackingMaps()?
          Hide
          dragonsinth Scott Blum added a comment -

          Attached a slightly modified patchfile with the changes I suggested.
          Otherwise LGTM

          Show
          dragonsinth Scott Blum added a comment - Attached a slightly modified patchfile with the changes I suggested. Otherwise LGTM
          Hide
          dragonsinth Scott Blum added a comment -

          That said, if you'd feel more comfortable committing your patch as-is, I can toss my tweaks down after.

          Show
          dragonsinth Scott Blum added a comment - That said, if you'd feel more comfortable committing your patch as-is, I can toss my tweaks down after.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 232b44e283dfd01f9ec01b4e68b09b3755a1b17a in lucene-solr's branch refs/heads/master from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=232b44e ]

          SOLR-8744: Minimize the impact on ZK when there are a lot of blocked tasks

          Show
          jira-bot ASF subversion and git services added a comment - Commit 232b44e283dfd01f9ec01b4e68b09b3755a1b17a in lucene-solr's branch refs/heads/master from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=232b44e ] SOLR-8744 : Minimize the impact on ZK when there are a lot of blocked tasks
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ff6475c3a7229a27f7e97057fbbfadd2600032dd in lucene-solr's branch refs/heads/branch_6x from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff6475c ]

          SOLR-8744: Minimize the impact on ZK when there are a lot of blocked tasks

          Show
          jira-bot ASF subversion and git services added a comment - Commit ff6475c3a7229a27f7e97057fbbfadd2600032dd in lucene-solr's branch refs/heads/branch_6x from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff6475c ] SOLR-8744 : Minimize the impact on ZK when there are a lot of blocked tasks
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4726c5b2d2efa9ba160b608d46a977d0a6b83f94 in lucene-solr's branch refs/heads/branch_6_1 from Noble Paul
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4726c5b ]

          SOLR-8744: Minimize the impact on ZK when there are a lot of blocked tasks

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4726c5b2d2efa9ba160b608d46a977d0a6b83f94 in lucene-solr's branch refs/heads/branch_6_1 from Noble Paul [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4726c5b ] SOLR-8744 : Minimize the impact on ZK when there are a lot of blocked tasks
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dragonsinth closed the pull request at:

          https://github.com/apache/lucene-solr/pull/42

          Show
          githubbot ASF GitHub Bot added a comment - Github user dragonsinth closed the pull request at: https://github.com/apache/lucene-solr/pull/42

            People

            • Assignee:
              noble.paul Noble Paul
              Reporter:
              dragonsinth Scott Blum
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development