Cassandra
  1. Cassandra
  2. CASSANDRA-4038

Investigate improving the dynamic snitch with reservoir sampling

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      Dsnitch's UPDATES_PER_INTERVAL and WINDOW_SIZE are chosen somewhat arbitrarily. A better fit may be something similar to Metric's ExponentiallyDecayingSample, where more recent information is weighted heavier than past information, and reservoir sampling would also be an efficient way of keeping a statistically significant sample rather than refusing updates after UPDATES_PER_INTERVAL and only keeping WINDOW_SIZE amount.

      1. CASSANDRA-4038.patch
        13 kB
        Pavel Yaskevich

        Activity

        Gavin made changes -
        Workflow patch-available, re-open possible [ 12749841 ] reopen-resolved, no closed status, patch-avail, testing [ 12754454 ]
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12657676 ] patch-available, re-open possible [ 12749841 ]
        Pavel Yaskevich made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Pavel Yaskevich added a comment -

        Committed.

        Show
        Pavel Yaskevich added a comment - Committed.
        Hide
        Brandon Williams added a comment -

        It doesn't, really. Instead of using a fixed sample size we use a statistically accurate continuous sample. The math using the value is the same.

        Show
        Brandon Williams added a comment - It doesn't, really. Instead of using a fixed sample size we use a statistically accurate continuous sample. The math using the value is the same.
        Hide
        Jonathan Ellis added a comment -

        How does this affect the math in the original phi accrual failure detector? Is it worth getting Paul to look into that?

        Show
        Jonathan Ellis added a comment - How does this affect the math in the original phi accrual failure detector? Is it worth getting Paul to look into that?
        Hide
        Brandon Williams added a comment -

        That's a decent percentage increase, but still 0.001ms/request is pretty minuscule. LGTM, +1.

        Show
        Brandon Williams added a comment - That's a decent percentage increase, but still 0.001ms/request is pretty minuscule. LGTM, +1.
        Hide
        Pavel Yaskevich added a comment -

        No, it's milliseconds, old one runs in ~80 ms for 100,000 inserts and new one ~109 ms on the same amount.

        Show
        Pavel Yaskevich added a comment - No, it's milliseconds, old one runs in ~80 ms for 100,000 inserts and new one ~109 ms on the same amount.
        Hide
        Brandon Williams added a comment -

        Yes, I did a few profiling tests and I see ~30 ms degradation in receiveTiming

        This is micros, right?

        Show
        Brandon Williams added a comment - Yes, I did a few profiling tests and I see ~30 ms degradation in receiveTiming This is micros, right?
        Hide
        Pavel Yaskevich added a comment -

        Have you done any profiling to see if this actually is cheaper than the fixed window size? Specifically I'm worried about receiveTiming becoming more expensive.

        Yes, I did a few profiling tests and I see ~30 ms degradation in receiveTiming speed inserting 100000 latency records (increased UPDATES_PER_INTERVAL value to be fare with the test).

        Show
        Pavel Yaskevich added a comment - Have you done any profiling to see if this actually is cheaper than the fixed window size? Specifically I'm worried about receiveTiming becoming more expensive. Yes, I did a few profiling tests and I see ~30 ms degradation in receiveTiming speed inserting 100000 latency records (increased UPDATES_PER_INTERVAL value to be fare with the test).
        Hide
        Brandon Williams added a comment -

        I'm a bit concerned that shoehorning latency timings into a long from a double will always yield zero in a healthy gigabit network where the timings are generally fractional. But, there's a good chance in a situation with such similar values their weight is irrelevant after CASSANDRA-3722 anyway.

        Have you done any profiling to see if this actually is cheaper than the fixed window size? Specifically I'm worried about receiveTiming becoming more expensive.

        Show
        Brandon Williams added a comment - I'm a bit concerned that shoehorning latency timings into a long from a double will always yield zero in a healthy gigabit network where the timings are generally fractional. But, there's a good chance in a situation with such similar values their weight is irrelevant after CASSANDRA-3722 anyway. Have you done any profiling to see if this actually is cheaper than the fixed window size? Specifically I'm worried about receiveTiming becoming more expensive.
        Pavel Yaskevich made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Reviewer brandon.williams
        Pavel Yaskevich made changes -
        Attachment CASSANDRA-4038.patch [ 12538151 ]
        Hide
        Pavel Yaskevich added a comment -

        I think it's worth pursuing as that would remove the work we are doing now by restricting sampling to window size and number of updates in the interval, calculating age of each response arrival, as well as improve sampling by moving to exponential decay function. There is already implementation available by Apache 2.0 License https://github.com/codahale/metrics/blob/master/metrics-core/src/main/java/com/yammer/metrics/stats/ExponentiallyDecayingSample.java

        Show
        Pavel Yaskevich added a comment - I think it's worth pursuing as that would remove the work we are doing now by restricting sampling to window size and number of updates in the interval, calculating age of each response arrival, as well as improve sampling by moving to exponential decay function. There is already implementation available by Apache 2.0 License https://github.com/codahale/metrics/blob/master/metrics-core/src/main/java/com/yammer/metrics/stats/ExponentiallyDecayingSample.java
        Jonathan Ellis made changes -
        Field Original Value New Value
        Assignee Brandon Williams [ brandon.williams ] Pavel Yaskevich [ xedin ]
        Hide
        Jonathan Ellis added a comment -

        Pavel, take a look at this and see if it's worth pursuing.

        Show
        Jonathan Ellis added a comment - Pavel, take a look at this and see if it's worth pursuing.
        Brandon Williams created issue -

          People

          • Assignee:
            Pavel Yaskevich
            Reporter:
            Brandon Williams
            Reviewer:
            Brandon Williams
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development