Solr
  1. Solr
  2. SOLR-7636

CLUSTERSTATUS Api should not go to OCP to fetch data

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently it does multiple ZK operations which is not required. It should just read the status from ZK and return from the CollectionsHandler

        Issue Links

          Activity

          Hide
          ASF subversion and git services added a comment -

          Commit 1683514 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1683514 ]

          SOLR-7636: CLUSTERSTATUS API is executed at CollectionsHandler

          Show
          ASF subversion and git services added a comment - Commit 1683514 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1683514 ] SOLR-7636 : CLUSTERSTATUS API is executed at CollectionsHandler
          Hide
          ASF subversion and git services added a comment -

          Commit 1683515 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1683515 ]

          SOLR-7636: set svn eol style

          Show
          ASF subversion and git services added a comment - Commit 1683515 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1683515 ] SOLR-7636 : set svn eol style
          Hide
          ASF subversion and git services added a comment -

          Commit 1683519 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1683519 ]

          SOLR-7636: CLUSTERSTATUS API is executed at CollectionsHandler

          Show
          ASF subversion and git services added a comment - Commit 1683519 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1683519 ] SOLR-7636 : CLUSTERSTATUS API is executed at CollectionsHandler
          Hide
          Shalin Shekhar Mangar added a comment -

          I don't think this change is right. Earlier, cluster status was guaranteed to return the latest and most up-to-date cluster state but with this change, a node which is not connected to ZooKeeper can return stale data. This change should either be reverted or fixed.

          Show
          Shalin Shekhar Mangar added a comment - I don't think this change is right. Earlier, cluster status was guaranteed to return the latest and most up-to-date cluster state but with this change, a node which is not connected to ZooKeeper can return stale data. This change should either be reverted or fixed.
          Hide
          Noble Paul added a comment -

          In the previous model if the node cannot connect to zk how do you expect it to send a message to the OCP ?

          Show
          Noble Paul added a comment - In the previous model if the node cannot connect to zk how do you expect it to send a message to the OCP ?
          Hide
          Shalin Shekhar Mangar added a comment -

          In the previous model if the node cannot connect to zk how do you expect it to send a message to the OCP ?

          That's right. It does not go to OCP but errors out. This commit breaks that guarantee. After this change, when I issue a clusterstatus command I have no idea whether I am getting stale state or not.

          Show
          Shalin Shekhar Mangar added a comment - In the previous model if the node cannot connect to zk how do you expect it to send a message to the OCP ? That's right. It does not go to OCP but errors out. This commit breaks that guarantee. After this change, when I issue a clusterstatus command I have no idea whether I am getting stale state or not.
          Hide
          Noble Paul added a comment -

          The node should try to fetch the latest state and if it is not able to connect just fail

          Show
          Noble Paul added a comment - The node should try to fetch the latest state and if it is not able to connect just fail
          Hide
          ASF subversion and git services added a comment -

          Commit 1683558 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1683558 ]

          SOLR-7636: Update from ZK before returning the status

          Show
          ASF subversion and git services added a comment - Commit 1683558 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1683558 ] SOLR-7636 : Update from ZK before returning the status
          Hide
          ASF subversion and git services added a comment -

          Commit 1683560 from Noble Paul in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1683560 ]

          SOLR-7636: Update from ZK before returning the status

          Show
          ASF subversion and git services added a comment - Commit 1683560 from Noble Paul in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1683560 ] SOLR-7636 : Update from ZK before returning the status
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks, I think this is better.

          You added a todo item in OverseerCollectionProcessor#checkExclusivity. Please take care of it too.

          Show
          Shalin Shekhar Mangar added a comment - Thanks, I think this is better. You added a todo item in OverseerCollectionProcessor#checkExclusivity. Please take care of it too.
          Hide
          Noble Paul added a comment -

          That is for back compat. We should remove it in a later release

          Show
          Noble Paul added a comment - That is for back compat. We should remove it in a later release
          Hide
          Shalin Shekhar Mangar added a comment -

          Ah, okay. +1 to close this issue then.

          Show
          Shalin Shekhar Mangar added a comment - Ah, okay. +1 to close this issue then.
          Hide
          Varun Thacker added a comment -

          I'll mark SOLR-7464 as duplicate then.

          Shalin the concern that you raised about the node now returning stale results will also apply to other operations like LIST which are served from the CollectionsHandler as well?

          Show
          Varun Thacker added a comment - I'll mark SOLR-7464 as duplicate then. Shalin the concern that you raised about the node now returning stale results will also apply to other operations like LIST which are served from the CollectionsHandler as well?
          Hide
          Shalin Shekhar Mangar added a comment -

          Shalin the concern that you raised about the node now returning stale results will also apply to other operations like LIST which are served from the CollectionsHandler as well?

          Yes, for the sake of correctness, LIST should also hit ZK directly. It'd perhaps be overkill to update the entire cluster state though.

          Show
          Shalin Shekhar Mangar added a comment - Shalin the concern that you raised about the node now returning stale results will also apply to other operations like LIST which are served from the CollectionsHandler as well? Yes, for the sake of correctness, LIST should also hit ZK directly. It'd perhaps be overkill to update the entire cluster state though.
          Hide
          Varun Thacker added a comment -

          Yes, for the sake of correctness, LIST should also hit ZK directly. It'd perhaps be overkill to update the entire cluster state though.

          Right. Thought as well.

          If users call the cluster status api frequently wouldn't the benefit of moving this out of the OCP be lost? APIs like CLUSTERSTATUS/LIST are probably consumed by monitoring applications which might call it quite frequently.

          Show
          Varun Thacker added a comment - Yes, for the sake of correctness, LIST should also hit ZK directly. It'd perhaps be overkill to update the entire cluster state though. Right. Thought as well. If users call the cluster status api frequently wouldn't the benefit of moving this out of the OCP be lost? APIs like CLUSTERSTATUS/LIST are probably consumed by monitoring applications which might call it quite frequently.
          Hide
          Noble Paul added a comment -

          If users call the cluster status api frequently wouldn't the benefit of moving this out of the OCP be lost?

          Before this fix, the call path was as follows

          colectionshandler -> ZK Q -> OCP -> (ZK Q read + ZK Q DEL + ZK Q Write) -> (CollectionsHandler ZK READ+ DEL)

          Now it has become

          CollectionsHandler -> ZK READ

          So, this is definitely a win as of now.

          Show
          Noble Paul added a comment - If users call the cluster status api frequently wouldn't the benefit of moving this out of the OCP be lost? Before this fix, the call path was as follows colectionshandler -> ZK Q -> OCP -> (ZK Q read + ZK Q DEL + ZK Q Write) -> (CollectionsHandler ZK READ+ DEL) Now it has become CollectionsHandler -> ZK READ So, this is definitely a win as of now.
          Hide
          Noble Paul added a comment -

          other operations like LIST which are served from the CollectionsHandler as well?

          All READ operations must be served directly from CollectionsHandler

          Show
          Noble Paul added a comment - other operations like LIST which are served from the CollectionsHandler as well? All READ operations must be served directly from CollectionsHandler
          Hide
          Shalin Shekhar Mangar added a comment -

          Before this fix, the call path was as follows
          colectionshandler -> ZK Q -> OCP -> (ZK Q read + ZK Q DEL + ZK Q Write) -> (CollectionsHandler ZK READ+ DEL)
          Now it has become
          CollectionsHandler -> ZK READ
          So, this is definitely a win as of now.

          That is not true. Because we have to forcefully update the cluster state, we have CollectionsHandler -> ZK getChildren (/live_nodes) + ZK read (/clusterstate.json) + ZK getChildren + N * ZK exists (for getIndividualColls) + W * ZK exists + W * getData (for watched collections) + (N-W) * ZK exists + (N-W) * ZK getData (for non-watched collections).

          Show
          Shalin Shekhar Mangar added a comment - Before this fix, the call path was as follows colectionshandler -> ZK Q -> OCP -> (ZK Q read + ZK Q DEL + ZK Q Write) -> (CollectionsHandler ZK READ+ DEL) Now it has become CollectionsHandler -> ZK READ So, this is definitely a win as of now. That is not true. Because we have to forcefully update the cluster state, we have CollectionsHandler -> ZK getChildren (/live_nodes) + ZK read (/clusterstate.json) + ZK getChildren + N * ZK exists (for getIndividualColls) + W * ZK exists + W * getData (for watched collections) + (N-W) * ZK exists + (N-W) * ZK getData (for non-watched collections).
          Hide
          Noble Paul added a comment -

          Yeah , I have lumped them together as ZK read. We can optimize the update states

          Show
          Noble Paul added a comment - Yeah , I have lumped them together as ZK read. We can optimize the update states
          Hide
          Shalin Shekhar Mangar added a comment -

          Yes, some of the ZK exists + ZK read combinations can be simplified to just use ZK reads. I'll open an issue.

          Show
          Shalin Shekhar Mangar added a comment - Yes, some of the ZK exists + ZK read combinations can be simplified to just use ZK reads. I'll open an issue.
          Hide
          Shalin Shekhar Mangar added a comment -

          Bulk close for 5.3.0 release

          Show
          Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

            People

            • Assignee:
              Noble Paul
              Reporter:
              Noble Paul
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development