Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8722

Don't force a full ZkStateReader refresh on every Overseer operation

    Details

    • Flags:
      Patch

      Description

      We're doing an unnecessary ZkStateReader forced refresh on all Overseer operations. This isn't necessary because ZkStateReader keeps itself up to date.

      According to Shalin Shekhar Mangar's analysis, we just need to put a wait loop at the end of addReplica to observe the state change.

        Issue Links

          Activity

          Hide
          dragonsinth Scott Blum added a comment -

          Shalin Shekhar Mangar is this what you had in mind? We could probably optimize exactly what this code waits on, but I figured I'd just start by reusing the existing code. But looping 320 x 1 second seems a bit excessive for a new core creation.

          Also, what's the right way to handle things if the remote call fails? Will the code throw before it reaches the wait loop, or should I try to inspect the response object for errors before deciding to wait? I didn't see any special handling at other call sites.

          Show
          dragonsinth Scott Blum added a comment - Shalin Shekhar Mangar is this what you had in mind? We could probably optimize exactly what this code waits on, but I figured I'd just start by reusing the existing code. But looping 320 x 1 second seems a bit excessive for a new core creation. Also, what's the right way to handle things if the remote call fails? Will the code throw before it reaches the wait loop, or should I try to inspect the response object for errors before deciding to wait? I didn't see any special handling at other call sites.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Hi Scott, yes this should work fine. Sorry for being out of the loop re: SOLR-8696. I was travelling back to India and had a terrible jet lag.

          Also, what's the right way to handle things if the remote call fails? Will the code throw before it reaches the wait loop, or should I try to inspect the response object for errors before deciding to wait? I didn't see any special handling at other call sites.

          Hmm, good question. The processResponses method will throw an exception if abortOnError=true (which it is, in this case) but the 'async' handling does not throw any exceptions. So if the remote call fails and the user had specified the 'async' parameter, the wait loop will still be invoked.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Hi Scott, yes this should work fine. Sorry for being out of the loop re: SOLR-8696 . I was travelling back to India and had a terrible jet lag. Also, what's the right way to handle things if the remote call fails? Will the code throw before it reaches the wait loop, or should I try to inspect the response object for errors before deciding to wait? I didn't see any special handling at other call sites. Hmm, good question. The processResponses method will throw an exception if abortOnError=true (which it is, in this case) but the 'async' handling does not throw any exceptions. So if the remote call fails and the user had specified the 'async' parameter, the wait loop will still be invoked.
          Hide
          dragonsinth Scott Blum added a comment -

          No problem! Welcome back, hope you had a good trip.

          What would you suggest in this case? Spinning for a very long time when the remote call might have failed quickly seems bad. What other handler methods do you think demonstrate the correct pattern? The code can be hard to follow. Sometime when you're on IRC I'd love to pick your brain about the code in OverseerCollectionMessageHandler & OverseerTaskProcessor.

          Show
          dragonsinth Scott Blum added a comment - No problem! Welcome back, hope you had a good trip. What would you suggest in this case? Spinning for a very long time when the remote call might have failed quickly seems bad. What other handler methods do you think demonstrate the correct pattern? The code can be hard to follow. Sometime when you're on IRC I'd love to pick your brain about the code in OverseerCollectionMessageHandler & OverseerTaskProcessor.
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          What would you suggest in this case? Spinning for a very long time when the remote call might have failed quickly seems bad.

          I guess the only way is to modify the processResponses method to throw an exception in case the async request failed just like how we do it today for non-async requests.

          What other handler methods do you think demonstrate the correct pattern?

          I don't think any of them do

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - What would you suggest in this case? Spinning for a very long time when the remote call might have failed quickly seems bad. I guess the only way is to modify the processResponses method to throw an exception in case the async request failed just like how we do it today for non-async requests. What other handler methods do you think demonstrate the correct pattern? I don't think any of them do
          Hide
          dragonsinth Scott Blum added a comment -

          In that case, should we land this patch as-is and then come back with a more holistic answer for async failures?

          Show
          dragonsinth Scott Blum added a comment - In that case, should we land this patch as-is and then come back with a more holistic answer for async failures?
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          In that case, should we land this patch as-is and then come back with a more holistic answer for async failures?

          +1, I'll commit after running tests.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - In that case, should we land this patch as-is and then come back with a more holistic answer for async failures? +1, I'll commit after running tests.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 93133f54fd8cad47d7638c50a2360e3eb9daeb14 in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=93133f5 ]

          SOLR-8722: Don't force a full ZkStateReader refresh on every Overseer operation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 93133f54fd8cad47d7638c50a2360e3eb9daeb14 in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=93133f5 ] SOLR-8722 : Don't force a full ZkStateReader refresh on every Overseer operation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e4712bb028849f9a9b202651728c1f5c0a224374 in lucene-solr's branch refs/heads/branch_6x from Shalin Shekhar Mangar
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e4712bb ]

          SOLR-8722: Don't force a full ZkStateReader refresh on every Overseer operation
          (cherry picked from commit 93133f5)

          Show
          jira-bot ASF subversion and git services added a comment - Commit e4712bb028849f9a9b202651728c1f5c0a224374 in lucene-solr's branch refs/heads/branch_6x from Shalin Shekhar Mangar [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e4712bb ] SOLR-8722 : Don't force a full ZkStateReader refresh on every Overseer operation (cherry picked from commit 93133f5)
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Thanks Scott!

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Thanks Scott!
          Hide
          hossman Hoss Man added a comment -

          Manually correcting fixVersion per Step #S5 of LUCENE-7271

          Show
          hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271

            People

            • Assignee:
              shalinmangar Shalin Shekhar Mangar
              Reporter:
              dragonsinth Scott Blum
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development