Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-5830

Paxos loops endlessly due to faulty condition check

    XMLWordPrintableJSON

Details

    Description

      Following is the code segment (StorageProxy.java:361) which causes the issue:

      Start is the start time of the paxos, is always less than the current system time, and therefore the negative difference is always less than the timeout.

      StorageProxy.java
      private static UUID beginAndRepairPaxos(long start, ByteBuffer key, CFMetaData metadata, List<InetAddress> liveEndpoints, int requiredParticipants, ConsistencyLevel consistencyForPaxos)
          throws WriteTimeoutException
          {
              long timeout = TimeUnit.MILLISECONDS.toNanos(DatabaseDescriptor.getCasContentionTimeout());
      
              PrepareCallback summary = null;
              while (start - System.nanoTime() < timeout)
              {
                  long ballotMillis = summary == null
                                    ? System.currentTimeMillis()
                                    : Math.max(System.currentTimeMillis(), 1 + UUIDGen.unixTimestamp(summary.inProgressCommit.ballot));
                  UUID ballot = UUIDGen.getTimeUUID(ballotMillis);
      

      Here, the paxos gets stuck when PREPARE returns 'true' but with inProgressCommit. The code in StorageProxy.java:beginAndRepairPaxos() then tries to issue a PROPOSE and COMMIT for the inProgressCommit, and if it repeatedly receives 'false' as a PREPARE_RESPONSE it gets stuck in an endless loop until PREPARE_RESPONSE is true.

      Attachments

        Activity

          People

            soumava Soumava Ghosh
            soumava Soumava Ghosh
            Soumava Ghosh
            Jonathan Ellis
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: