Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-19365

invalid EstimatedHistogramReservoirSnapshot::getValue values due to race condition in DecayingEstimatedHistogramReservoir

    XMLWordPrintableJSON

Details

    Description

      `DecayingEstimatedHistogramReservoir` has a race condition between `update` and `rescaleIfNeeded`.
      A sample which ends up (`update`) in an already scaled decayingBucket (`rescaleIfNeeded`) may still use a non-scaled weight because `decayLandmark` has not been updated yet at the moment of `update`.
       
      The observed consequence was flooding of the cluster with speculative retries (we happened to hit low-percentile buckets with overweight samples, which drove p99 below true p50 for a long time).

      Please note that despite the manifestation being similar to CASSANDRA-19330, these are two distinct bugs in their own right.

      This bug affects versions 4.0+
      On 3.11 there's locking in DEHR. I did not check earlier versions.

      Attachments

        1. ci_summary.html
          38 kB
          Caleb Rackliffe
        2. result_details.tar.gz
          2.76 MB
          Caleb Rackliffe

        Issue Links

          Activity

            People

              mmuzaf Maxim Muzafarov
              jakubzytka Jakub Zytka
              Jakub Zytka, Maxim Muzafarov
              Caleb Rackliffe
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 20m
                  4h 20m