Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-20059

TCM's Retry.Deadline#retryIndefinitely is dangerous if used with RemoteProcessor as the deadline does not impact message retries

    XMLWordPrintableJSON

Details

    Description

      public static Deadline retryIndefinitely(long timeoutNanos, Meter retryMeter)
      {
          return new Deadline(Clock.Global.nanoTime() + timeoutNanos,
                              new Retry.Jitter(Integer.MAX_VALUE, DEFAULT_BACKOFF_MS, new Random(), retryMeter))
          {
              @Override
              public boolean reachedMax()
              {
                  return false;
              }
      
              @Override
              public long remainingNanos()
              {
                  return timeoutNanos;
              }
      
              public String toString()
              {
                  return String.format("RetryIndefinitely{tries=%d}", currentTries());
              }
          };
      }
      

      Sample usage pattern (example is in Accord, but same pattern exists in RemoteProcessor.commit)

      Promise<LogState> request = new AsyncPromise<>();
      List<InetAddressAndPort> candidates = new ArrayList<>(log.metadata().fullCMSMembers());
      sendWithCallbackAsync(request,
                            Verb.TCM_RECONSTRUCT_EPOCH_REQ,
                            new ReconstructLogState(lowEpoch, highEpoch, includeSnapshot),
                            new CandidateIterator(candidates),
                            retryPolicy);
      return request.get(retryPolicy.remainingNanos(), TimeUnit.NANOSECONDS);
      

      The issue here is that the networking retry has no clue that we gave up waiting on the request, so we will keep retrying until success! The reason for this is “reachedMax” is used to see if its safe to run again, but it isn’t as the deadline has passed!

      Attachments

        1. ci_summary-trunk-3fa63cf81ce03bfa45c2b312c1c2846a1d84eee5.html
          34 kB
          David Capwell
        2. result_details-trunk-3fa63cf81ce03bfa45c2b312c1c2846a1d84eee5.tar.gz
          3.43 MB
          David Capwell
        3. ci_summary.html
          65 kB
          David Capwell
        4. result_details.tar.gz
          2.92 MB
          David Capwell

        Activity

          People

            dcapwell David Capwell
            dcapwell David Capwell
            David Capwell
            Alex Petrov, Sam Tunnicliffe
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m