Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9526

Provide a JMX hook to monitor phi values in the FailureDetector

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 2.2.4, 3.0.0
    • Component/s: None
    • Labels:
    • Flags:
      Patch
    • Reproduced In:

      Description

      phi_convict_threshold can be tuned, but there's currently no way to monitor the phi values to see if you're getting close.

      The attached patch adds the ability to get these values via JMX.

      1. Monitor-Phi-JMX.patch
        6 kB
        Ron Kuris
      2. Phi-Log-Debug-When-Close.patch
        4 kB
        Ron Kuris
      3. Tiny-Race-Condition.patch
        2 kB
        Ron Kuris

        Issue Links

          Activity

          Hide
          slebresne Sylvain Lebresne added a comment -

          Committed followup (which lgtm).

          Show
          slebresne Sylvain Lebresne added a comment - Committed followup (which lgtm).
          Hide
          aweisberg Ariel Weisberg added a comment -

          Committer please do a quick code review of this fix.

          Show
          aweisberg Ariel Weisberg added a comment - Committer please do a quick code review of this fix.
          Hide
          aweisberg Ariel Weisberg added a comment -

          See previous post for patch info

          Show
          aweisberg Ariel Weisberg added a comment - See previous post for patch info
          Show
          aweisberg Ariel Weisberg added a comment - Updated pull request (branch name is wrong) utests dtests
          Hide
          aweisberg Ariel Weisberg added a comment -

          The code I added for formatting the information in nodetool has a minor bug found by Coverity where it uses '\n" instead of "%n" so you don't get the platform specific line ending.

          Will post an update.

          Show
          aweisberg Ariel Weisberg added a comment - The code I added for formatting the information in nodetool has a minor bug found by Coverity where it uses '\n" instead of "%n" so you don't get the platform specific line ending. Will post an update.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          Committed as 3cd750012e0cf3b55442c036627fb032c29c16bc to 2.2 and merged upwards, thank you.

          Next time, though, would be good to have CI results for 3.0 (this time made an exception for 3.0, but wouldn't have done it for 2.1->2.2 merge).

          Show
          iamaleksey Aleksey Yeschenko added a comment - Committed as 3cd750012e0cf3b55442c036627fb032c29c16bc to 2.2 and merged upwards, thank you. Next time, though, would be good to have CI results for 3.0 (this time made an exception for 3.0, but wouldn't have done it for 2.1->2.2 merge).
          Hide
          aweisberg Ariel Weisberg added a comment -

          This is ready to commit. It's a good idea for the committer to give it one more pass.

          Show
          aweisberg Ariel Weisberg added a comment - This is ready to commit. It's a good idea for the committer to give it one more pass.
          Hide
          aweisberg Ariel Weisberg added a comment -

          Ron Kuris Can you review the changes I made and the test?

          Show
          aweisberg Ariel Weisberg added a comment - Ron Kuris Can you review the changes I made and the test?
          Show
          aweisberg Ariel Weisberg added a comment - Created a dtest https://github.com/riptano/cassandra-dtest/pull/603 Code utests dtests
          Hide
          aweisberg Ariel Weisberg added a comment -

          Also looks like I need to modify nodetool to allow reporting on these values. Now there are a few UI questions as to what this should look like, whether it should be output by the existing gossipinfo command or be a new one.

          It looks like gossipinfo already dumps quite a bit of information about version of various dimensions of gossip state for each endpoint. Phi values seem kind of orthogonal to gossip propagation. It makes me think nodetool would benefit from having a failure detector specific command.

          I am going to implement that just to get started on writing a test for this. It's not a big deal to refactor on review.

          Show
          aweisberg Ariel Weisberg added a comment - Also looks like I need to modify nodetool to allow reporting on these values. Now there are a few UI questions as to what this should look like, whether it should be output by the existing gossipinfo command or be a new one. It looks like gossipinfo already dumps quite a bit of information about version of various dimensions of gossip state for each endpoint. Phi values seem kind of orthogonal to gossip propagation. It makes me think nodetool would benefit from having a failure detector specific command. I am going to implement that just to get started on writing a test for this. It's not a big deal to refactor on review.
          Hide
          aweisberg Ariel Weisberg added a comment -

          Sorry the hold up has been that there is no test for the feature and I haven't had time to write one because I have been doing a lot of reviews for 3.0.

          Ron if you add a JMX test for this we can get it into 2.2.

          Show
          aweisberg Ariel Weisberg added a comment - Sorry the hold up has been that there is no test for the feature and I haven't had time to write one because I have been doing a lot of reviews for 3.0. Ron if you add a JMX test for this we can get it into 2.2.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          Ron Kuris Yep. Technically, so long as there is +1 from Ariel, and the patch still applies, we are good to go.

          Show
          iamaleksey Aleksey Yeschenko added a comment - Ron Kuris Yep. Technically, so long as there is +1 from Ariel, and the patch still applies, we are good to go.
          Hide
          rkuris Ron Kuris added a comment -

          Aleksey Yeschenko Is this because 2.0.x is no longer receiving updates? Is there anything I can do to get this into 2.2.x?

          Show
          rkuris Ron Kuris added a comment - Aleksey Yeschenko Is this because 2.0.x is no longer receiving updates? Is there anything I can do to get this into 2.2.x?
          Hide
          aweisberg Ariel Weisberg added a comment -

          Rebased to 2.2 and squashed.

          utests and dtests

          The core submission hasn't changed. I corrected some style issues like newline after brace. The logging changes look like an improvement to me, but the failure detector is not an area of expertise for me yet. I also updated CHANGES.txt. It seems like you don't take away any existing information or make things chattier except in the case of interpret() which if trace is enabled will always log now.

          In the last case it logs two statements back to back. Maybe I should change that to be a single statement with both the phi value and mean?

          Show
          aweisberg Ariel Weisberg added a comment - Rebased to 2.2 and squashed. utests and dtests The core submission hasn't changed. I corrected some style issues like newline after brace. The logging changes look like an improvement to me, but the failure detector is not an area of expertise for me yet. I also updated CHANGES.txt. It seems like you don't take away any existing information or make things chattier except in the case of interpret() which if trace is enabled will always log now. In the last case it logs two statements back to back. Maybe I should change that to be a single statement with both the phi value and mean?
          Hide
          jbellis Jonathan Ellis added a comment -

          Reassigning review to Ariel Weisberg

          Show
          jbellis Jonathan Ellis added a comment - Reassigning review to Ariel Weisberg
          Hide
          mshuler Michael Shuler added a comment -

          I set this back to Patch Available - the reviewer is the person to validate patches and set Ready to Commit. Thanks!

          Show
          mshuler Michael Shuler added a comment - I set this back to Patch Available - the reviewer is the person to validate patches and set Ready to Commit. Thanks!
          Hide
          mshuler Michael Shuler added a comment -

          You've done the right thing to get this into Cassandra. Patches in JIRA are the "proper" method and your code reviewer will get to your additional fixes when possible. Thanks for your contribution!

          Show
          mshuler Michael Shuler added a comment - You've done the right thing to get this into Cassandra. Patches in JIRA are the "proper" method and your code reviewer will get to your additional fixes when possible. Thanks for your contribution!
          Hide
          rkuris Ron Kuris added a comment -

          Brandon Williams I haven't submitted anything before, so I'm wondering what I need to do to get this into Cassandra. Do you want a PR instead of the patches here?

          Show
          rkuris Ron Kuris added a comment - Brandon Williams I haven't submitted anything before, so I'm wondering what I need to do to get this into Cassandra. Do you want a PR instead of the patches here?
          Hide
          rkuris Ron Kuris added a comment -

          Fixed some minor problems found while running this code at high volume for a while, so uploaded a revised patchset. Also corrected header reorganization.

          Show
          rkuris Ron Kuris added a comment - Fixed some minor problems found while running this code at high volume for a while, so uploaded a revised patchset. Also corrected header reorganization.
          Hide
          rkuris Ron Kuris added a comment -
          Show
          rkuris Ron Kuris added a comment - Blog post with related info: http://hackers.lookout.com/2015/06/cassandra-phi/
          Hide
          rkuris Ron Kuris added a comment -

          Hi Brandon Williams,

          The rule I generally follow for checking logging levels is that if I actually compute anything instead of just passing a reference to stuff, then I check the logger level before making the call. This means I don't have to unnecessarily compute anything to make the call, and the call short-circuits right away and is nearly as cheap as checking outside the call. This pattern is documented in the slf4j docs here: http://www.slf4j.org/faq.html#logging_performance

          However, if you'd prefer consistency over this, I'd be glad to change it, just say the word.

          And, I'll fix the import reordering. Sometimes I hate IDEs.

          Show
          rkuris Ron Kuris added a comment - Hi Brandon Williams , The rule I generally follow for checking logging levels is that if I actually compute anything instead of just passing a reference to stuff, then I check the logger level before making the call. This means I don't have to unnecessarily compute anything to make the call, and the call short-circuits right away and is nearly as cheap as checking outside the call. This pattern is documented in the slf4j docs here: http://www.slf4j.org/faq.html#logging_performance However, if you'd prefer consistency over this, I'd be glad to change it, just say the word. And, I'll fix the import reordering. Sometimes I hate IDEs.
          Hide
          brandon.williams Brandon Williams added a comment -

          Thanks for the patch, Ron, this is good work.

          A couple of nits here:

          You disabled the logger.isTraceEnabled() in one call (which is fine) but then repeated the pattern you undid thereafter. I can be fine with either since they don't matter, but let's be consistent.

          Your IDE incorrectly reordered the imports.

          Show
          brandon.williams Brandon Williams added a comment - Thanks for the patch, Ron, this is good work. A couple of nits here: You disabled the logger.isTraceEnabled() in one call (which is fine) but then repeated the pattern you undid thereafter. I can be fine with either since they don't matter, but let's be consistent. Your IDE incorrectly reordered the imports.
          Hide
          rkuris Ron Kuris added a comment -

          There are three patches here. The main fix is in Monitor-Phi-JMX.patch. This fully resolves the reported issue.

          While inspecting this code, I noticed a small unlikely race condition. If two phi values come in at the same time as the first one for a host, one could be lost due to the way the values are being added to the Hashtable. The second patch resolves that window, by switching to a ConcurrentHashMap and using putIfAbsent to atomically check for a prior value.

          I doubt this could actually happen in the wild but it's still good defensive coding. Also, it removes Hashtable which is always synchronized.

          The third patch will start generating debug log messages when PHI starts getting close. It's a great way to see that phi_convict_threshold might be too low. It's not WARN or even INFO because this could generate a lot of logs, but arguably it could be. If someone has trouble with nodes going offline, they can turn up the debugging levels and see that phi_convict_threshold is the culprit.

          There is also some other code cleanup in the Phi-Log-Debug-When-Close patch.

          Show
          rkuris Ron Kuris added a comment - There are three patches here. The main fix is in Monitor-Phi-JMX.patch. This fully resolves the reported issue. While inspecting this code, I noticed a small unlikely race condition. If two phi values come in at the same time as the first one for a host, one could be lost due to the way the values are being added to the Hashtable. The second patch resolves that window, by switching to a ConcurrentHashMap and using putIfAbsent to atomically check for a prior value. I doubt this could actually happen in the wild but it's still good defensive coding. Also, it removes Hashtable which is always synchronized. The third patch will start generating debug log messages when PHI starts getting close. It's a great way to see that phi_convict_threshold might be too low. It's not WARN or even INFO because this could generate a lot of logs, but arguably it could be. If someone has trouble with nodes going offline, they can turn up the debugging levels and see that phi_convict_threshold is the culprit. There is also some other code cleanup in the Phi-Log-Debug-When-Close patch.

            People

            • Assignee:
              rkuris Ron Kuris
              Reporter:
              rkuris Ron Kuris
              Reviewer:
              Ariel Weisberg
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development