Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15432

The "read defragmentation" optimization does not work

    XMLWordPrintableJSON

Details

    Description

      The so-called "read defragmentation" that has been added way back with CASSANDRA-2503 actually does not work, and never has. That is, the defragmentation writes do happen, but they only additional load on the nodes without helping anything, and are thus a clear negative.

      The "read defragmentation" (which only impact so-called "names queries") kicks in when a read hits "too many" sstables (> 4 by default), and when it does, it writes down the result of that read. The assumption being that the next read for that data would only read the newly written data, which if not still in memtable would at least be in a single sstable, thus speeding that next read.

      Unfortunately, this is not how this work. When we defrag and write the result of our original read, we do so with the timestamp of the data read (as we should, changing the timestamp would be plain wrong). And as a result, following reads will read that data first, but will have no way to tell that no more sstables should be read. Technically, the reduceFilter call will not return null because the currentMaxTs will be higher than at least some of the data in the result, and this until we've read from as many sstables than in the original read.

      I see no easy way to fix this. It might be possible to make it work with additional per-sstable metadata, but nothing sufficiently simple and cheap to be worth it comes to mind. And I thus suggest simply removing that code.

      For the record, I'll note that there is actually a 2nd problem with that code: currently, we "defrag" a read even if we didn't got data for everything that the query requests. This also is "wrong" even if we ignore the first issue: a following read that would read the defragmented data would also have no way to know to not read more sstables to try to get the missing parts. This problem would be fixeable, but is obviously overshadowed by the previous one anyway.

      Anyway, as mentioned, I suggest to just remove the "optimization" (which again, never optimized anything) altogether, and happy to provide the simple patch.

      The only question might be in which versions? This impact all versions, but this isn't a correction bug either, "just" a performance one. So do we want 4.0 only or is there appetite for earlier?

      Attachments

        Activity

          People

            slebresne Sylvain Lebresne
            slebresne Sylvain Lebresne
            Sylvain Lebresne
            Aleksey Yeschenko
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: