Details
-
Improvement
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
None
Description
The current storage engine (which for this ticket I'll loosely define as "the code implementing the read/write path") is suffering from old age. One of the main problem is that the only structure it deals with is the cell, which completely ignores the more high level CQL structure that groups cell into (CQL) rows.
This leads to many inefficiencies, like the fact that during a reads we have to group cells multiple times (to count on replica, then to count on the coordinator, then to produce the CQL resultset) because we forget about the grouping right away each time (so lots of useless cell names comparisons in particular). But outside inefficiencies, having to manually recreate the CQL structure every time we need it for something is hindering new features and makes the code more complex that it should be.
Said storage engine also has tons of technical debt. To pick an example, the fact that during range queries we update SliceQueryFilter.count is pretty hacky and error prone. Or the overly complex ways AbstractQueryPager has to go into to simply "remove the last query result".
So I want to bite the bullet and modernize this storage engine. I propose to do 2 main things:
- Make the storage engine more aware of the CQL structure. In practice, instead of having partitions be a simple iterable map of cells, it should be an iterable list of row (each being itself composed of per-column cells, though obviously not exactly the same kind of cell we have today).
- Make the engine more iterative. What I mean here is that in the read path, we end up reading all cells in memory (we put them in a ColumnFamily object), but there is really no reason to. If instead we were working with iterators all the way through, we could get to a point where we're basically transferring data from disk to the network, and we should be able to reduce GC substantially.
Please note that such refactor should provide some performance improvements right off the bat but it's not its primary goal either. Its primary goal is to simplify the storage engine and adds abstraction that are better suited to further optimizations.
Attachments
Attachments
Issue Links
- blocks
-
CASSANDRA-7396 Allow selecting Map values and Set elements
- Resolved
-
CASSANDRA-8385 Clean up generics in uses of AbstractType
- Open
-
CASSANDRA-6237 Allow range deletions in CQL
- Resolved
-
CASSANDRA-8180 Optimize disk seek using min/max column name meta data when the LIMIT clause is used
- Resolved
-
CASSANDRA-8424 Collection filtering not working when using PK
- Resolved
- breaks
-
CASSANDRA-9763 PartitionUpdate must sort() before returning rowCount()
- Resolved
-
CASSANDRA-11026 OOM due to HeapByteBuffer instances
- Resolved
- incorporates
-
CASSANDRA-9888 BTreeBackedRow and ComplexColumnData
- Resolved
-
CASSANDRA-9974 Improve debuggability
- Resolved
-
CASSANDRA-9701 Enforce simple << complex sort order more strictly and efficiently
- Resolved
- is blocked by
-
CASSANDRA-8609 Remove depency of hadoop to internals (Cell/CellName)
- Resolved
-
CASSANDRA-8946 Make SSTableScanner always respect its bound
- Resolved
- is depended upon by
-
CASSANDRA-8440 Refactor StorageProxy
- Open
-
CASSANDRA-9471 Columns should be backed by a BTree, not an array
- Resolved
-
CASSANDRA-9472 Reintroduce off heap memtables
- Resolved
-
CASSANDRA-8809 Remove 'throws CassandraException'
- Resolved
-
CASSANDRA-9473 Introduce BTreeSet with support for in-place reversal
- Resolved
- is duplicated by
-
CASSANDRA-8339 Reading columns marked as type different than default validation class from CQL causes errors
- Resolved
-
CASSANDRA-8477 CMS GC can not recycle objects
- Resolved
-
CASSANDRA-4987 Support more queries when ALLOW FILTERING is used.
- Resolved
-
CASSANDRA-2986 Fix short reads in range (and index?) scans
- Resolved
-
CASSANDRA-3024 sstable and message varint encoding
- Resolved
-
CASSANDRA-6063 Rename internal classes and interfaces to represent the modern Cassandra terminology
- Resolved
- is required by
-
CASSANDRA-6412 Custom creation and merge functions for user-defined column types
- Open
- relates to
-
CASSANDRA-4175 Reduce memory, disk space, and cpu usage with a column name/id map
- Resolved
- supercedes
-
CASSANDRA-6915 Show storage rows in cqlsh
- Resolved
-
CASSANDRA-5966 Average name query performance much worse for wide rows
- Resolved