Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-6123

Break timestamp ties consistently for a given user requests

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: 4.x
    • Component/s: None
    • Labels:
      None

      Description

      The basic goal of this issue is to fix the fact that if 2 different clients issue "simultaneously" the 2 following updates:

      INSERT INTO foo(k, v1, v2) VALUES (0, 1, -1); // client1
      INSERT INTO foo(k, v1, v2) VALUES (0, -1, 1); // client2
      

      then, if both updates get the same timestamp, then currently, we don't guarantee that at the end the sum of v1 and v2 will be 0 (it won't be in that case).

      The idea to solves this is to make sure 2 updates never get the same "timestamp" by making the timestamp be the sum of the current time (and we can relatively easily make sur no 2 update coordinated by the same node have the same current time) and a small ID unique to each server node. We can generate this small unique server id thanks to CAS (see CASSANDRA-6108).

      Let's note that this solution is only for server-side generated timestamps. Client provided timestamp will still be allowed, but in that case it will be the job of the client to synchronize to not generate 2 identical timestamp if they care about this behavior.

      Note: see CASSANDRA-6106 for some related discussion on this issue.

        Issue Links

          Activity

          Hide
          ramonza Ramon Nogueira added a comment -

          Should we create another ticket for row-level isolation with client-side timestamps? From a user point of view, the fact that timestamps are even involved in establishing row-level isolation seems like an implementation detail at best.

          Show
          ramonza Ramon Nogueira added a comment - Should we create another ticket for row-level isolation with client-side timestamps? From a user point of view, the fact that timestamps are even involved in establishing row-level isolation seems like an implementation detail at best.
          Hide
          jjordan Jeremiah Jordan added a comment -

          This is fine as long as we keep in mind installs with more then 1000 servers won't be able to have unique numbers.

          Show
          jjordan Jeremiah Jordan added a comment - This is fine as long as we keep in mind installs with more then 1000 servers won't be able to have unique numbers.
          Hide
          jbellis Jonathan Ellis added a comment -

          Where does that limitation come from? We were looking at a 16-bit server ID.

          Show
          jbellis Jonathan Ellis added a comment - Where does that limitation come from? We were looking at a 16-bit server ID.
          Hide
          ramonza Ramon Nogueira added a comment -

          What about increasing the size of the timestamp? 112 bits should be enough to include the MAC address without affecting the interpretation of the existing timestamp. That way we could avoid CAS and also support client-generated timestamps by just appending a unique 48-bit value on the server side. This would be transparent to the client so they would just get row-level isolation without having to know that timestamps are involved at all.

          Show
          ramonza Ramon Nogueira added a comment - What about increasing the size of the timestamp? 112 bits should be enough to include the MAC address without affecting the interpretation of the existing timestamp. That way we could avoid CAS and also support client-generated timestamps by just appending a unique 48-bit value on the server side. This would be transparent to the client so they would just get row-level isolation without having to know that timestamps are involved at all.
          Hide
          xcbsmith Christopher Smith added a comment -

          I mentioned this in the other bug, but repeating it here as it is relevant to having a "definitive" solution.

          Rather than adding a server ID to the timestamp (and a static one at that), why not instead change how Cassandra resolves field updates with the same timestamp? The current logic is to have the version with the lowest value discarded, which seems broken and not useful. If instead the was that the version from the "lowest server" is discarded, we can ensure that competing writes to different servers will always be resolved with row-level isolation (eventual consistency always resolves to all of the fields from one of the writes overwriting the other updates).

          Defining which server is "lower" could be done with ID's, but since it really doesn't matter which server is deemed lower so long as it is consistently deemed lower, I think just using metadata that already uniquely identifies a server is sufficient. Why not simply resolve based on org.apache.cassandra.db.SystemKeyspace.getLocalHostId()?

          Advantages to this approach:

          1) No extra space need be consumed for each field value.
          2) No change in the SSTable file format.
          3) Ensures row-level isolation regardless of whether you are using client or server side timestamps.
          4) The resolution of timestamps stops being much of a concern.
          5) Scales to an effectively infinite number of nodes. (If we have UUID collisions we have so many more problems in Cassandra than timestamp ties.)

          Show
          xcbsmith Christopher Smith added a comment - I mentioned this in the other bug, but repeating it here as it is relevant to having a "definitive" solution. Rather than adding a server ID to the timestamp (and a static one at that), why not instead change how Cassandra resolves field updates with the same timestamp? The current logic is to have the version with the lowest value discarded, which seems broken and not useful. If instead the was that the version from the "lowest server" is discarded, we can ensure that competing writes to different servers will always be resolved with row-level isolation (eventual consistency always resolves to all of the fields from one of the writes overwriting the other updates). Defining which server is "lower" could be done with ID's, but since it really doesn't matter which server is deemed lower so long as it is consistently deemed lower, I think just using metadata that already uniquely identifies a server is sufficient. Why not simply resolve based on org.apache.cassandra.db.SystemKeyspace.getLocalHostId()? Advantages to this approach: 1) No extra space need be consumed for each field value. 2) No change in the SSTable file format. 3) Ensures row-level isolation regardless of whether you are using client or server side timestamps. 4) The resolution of timestamps stops being much of a concern. 5) Scales to an effectively infinite number of nodes. (If we have UUID collisions we have so many more problems in Cassandra than timestamp ties.)
          Hide
          thatran Thanh added a comment -

          To be clear, this jira does not apply to Paxos-generated timestamps (for LWT writes)..does it?

          Show
          thatran Thanh added a comment - To be clear, this jira does not apply to Paxos-generated timestamps (for LWT writes)..does it?
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          No, but CASSANDRA-7801 does.

          Show
          iamaleksey Aleksey Yeschenko added a comment - No, but CASSANDRA-7801 does.
          Hide
          michaelsembwever mck added a comment -

          Bumping to fix version 4.x, as 3.11.0 is a bug-fix only release.
            ref https://s.apache.org/EHBy

          Show
          michaelsembwever mck added a comment - Bumping to fix version 4.x, as 3.11.0 is a bug-fix only release.   ref https://s.apache.org/EHBy

            People

            • Assignee:
              Unassigned
              Reporter:
              slebresne Sylvain Lebresne
            • Votes:
              5 Vote for this issue
              Watchers:
              34 Start watching this issue

              Dates

              • Created:
                Updated:

                Development