Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2105

Fix the read race condition in CFStore for counters

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 0.8 beta 1
    • Component/s: None
    • Labels:

      Description

      There is a (known) race condition during counter read. Indeed, for standard
      column family there is a small time during which a memtable is both active and
      pending flush and similarly a small time during which a 'memtable' is both
      pending flush and an active sstable. For counters that would imply sometime
      reconciling twice during a read the same counterColumn and thus over-counting.

      Current code changes this slightly by trading the possibility to count twice a
      given counterColumn by the possibility to miss a counterColumn. Thus it trades
      over-counts for under-counts.

      But this is no fix and there is no hope to offer clients any kind of guarantee
      on reads unless we fix this.

      1. 2115_option1_withLock.patch
        11 kB
        Sylvain Lebresne
      2. 2115_option2_nolock.patch
        16 kB
        Sylvain Lebresne

        Activity

        Hide
        slebresne Sylvain Lebresne added a comment -

        I plan to reuse the fix I had made for #1546 that uses a ReadWriteLock to fix
        this (unless I find a better idea in the meantime). Unless proven otherwise I
        don't think this will have a huge impact on counter read performance, but if
        someone finds a better idea, I'm listening.

        Show
        slebresne Sylvain Lebresne added a comment - I plan to reuse the fix I had made for #1546 that uses a ReadWriteLock to fix this (unless I find a better idea in the meantime). Unless proven otherwise I don't think this will have a huge impact on counter read performance, but if someone finds a better idea, I'm listening.
        Hide
        slebresne Sylvain Lebresne added a comment -

        Attached not 1 but 2 options for this patch. I'm not sure with which version to go so I'm asking for opinions.

        Version 1 is the one extracted from #1546. It uses a ReadWriteLock to protect from the race condition.

        Version 2 don't use a lock. So less chances of lock contention which is always good. Only problem is, it still suffers in theory of a race condition. But I think this race condition is borderline impossible.
        Basically, given a memtable m being flushed, let's call s(m) the sstable initially produced by its flushing and let's denote by s'(m) any sstable resulting of the compaction of s(m). The race is if a read thread sees m when grabbing the references to the memtable being flushed and sees s'(m) (not s(m), that is the initial race condition and this is not impossible at all) when grabing the reference to the sstables.
        If it's unclear, the code has a comment explaining this that may be more clear.

        So not sure which version to go with. I may slightly lean towards Version 1 because I usually side with correction before anything else, but since this is in a critical path it feels slightly wasteful to use a lock for this given how remote the race condition of version 2 seems.

        Show
        slebresne Sylvain Lebresne added a comment - Attached not 1 but 2 options for this patch. I'm not sure with which version to go so I'm asking for opinions. Version 1 is the one extracted from #1546. It uses a ReadWriteLock to protect from the race condition. Version 2 don't use a lock. So less chances of lock contention which is always good. Only problem is, it still suffers in theory of a race condition. But I think this race condition is borderline impossible. Basically, given a memtable m being flushed, let's call s(m) the sstable initially produced by its flushing and let's denote by s'(m) any sstable resulting of the compaction of s(m). The race is if a read thread sees m when grabbing the references to the memtable being flushed and sees s'(m) (not s(m), that is the initial race condition and this is not impossible at all) when grabing the reference to the sstables. If it's unclear, the code has a comment explaining this that may be more clear. So not sure which version to go with. I may slightly lean towards Version 1 because I usually side with correction before anything else, but since this is in a critical path it feels slightly wasteful to use a lock for this given how remote the race condition of version 2 seems.
        Hide
        slebresne Sylvain Lebresne added a comment -

        I've opened CASSANDRA-2284 that provides what I think is a better solution than the one I have attached previously to this problem (I've opened it separately because it's a more generic solution, not just a counter related fix).

        Show
        slebresne Sylvain Lebresne added a comment - I've opened CASSANDRA-2284 that provides what I think is a better solution than the one I have attached previously to this problem (I've opened it separately because it's a more generic solution, not just a counter related fix).
        Hide
        hudson Hudson added a comment -

        Integrated in Cassandra #810 (See https://hudson.apache.org/hudson/job/Cassandra/810/)
        Atomically switch cfstore memtables and sstables
        patch by slebresne; reviewed by jbellis for CASSANDRA-2284 (and CASSANDRA-2105)

        Show
        hudson Hudson added a comment - Integrated in Cassandra #810 (See https://hudson.apache.org/hudson/job/Cassandra/810/ ) Atomically switch cfstore memtables and sstables patch by slebresne; reviewed by jbellis for CASSANDRA-2284 (and CASSANDRA-2105 )
        Hide
        slebresne Sylvain Lebresne added a comment -

        Fixed by CASSANDRA-2284

        Show
        slebresne Sylvain Lebresne added a comment - Fixed by CASSANDRA-2284

          People

          • Assignee:
            slebresne Sylvain Lebresne
            Reporter:
            slebresne Sylvain Lebresne
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 4h
              4h
              Remaining:
              Remaining Estimate - 4h
              4h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development