[CASSANDRA-8099] Refactor and modernize the storage engine - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 3.0 alpha 1
Component/s: None
Labels:
None

Description

The current storage engine (which for this ticket I'll loosely define as "the code implementing the read/write path") is suffering from old age. One of the main problem is that the only structure it deals with is the cell, which completely ignores the more high level CQL structure that groups cell into (CQL) rows.

This leads to many inefficiencies, like the fact that during a reads we have to group cells multiple times (to count on replica, then to count on the coordinator, then to produce the CQL resultset) because we forget about the grouping right away each time (so lots of useless cell names comparisons in particular). But outside inefficiencies, having to manually recreate the CQL structure every time we need it for something is hindering new features and makes the code more complex that it should be.

Said storage engine also has tons of technical debt. To pick an example, the fact that during range queries we update SliceQueryFilter.count is pretty hacky and error prone. Or the overly complex ways AbstractQueryPager has to go into to simply "remove the last query result".

So I want to bite the bullet and modernize this storage engine. I propose to do 2 main things:

Make the storage engine more aware of the CQL structure. In practice, instead of having partitions be a simple iterable map of cells, it should be an iterable list of row (each being itself composed of per-column cells, though obviously not exactly the same kind of cell we have today).
Make the engine more iterative. What I mean here is that in the read path, we end up reading all cells in memory (we put them in a ColumnFamily object), but there is really no reason to. If instead we were working with iterators all the way through, we could get to a point where we're basically transferring data from disk to the network, and we should be able to reduce GC substantially.

Please note that such refactor should provide some performance improvements right off the bat but it's not its primary goal either. Its primary goal is to simplify the storage engine and adds abstraction that are better suited to further optimizations.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

8099-nit
13/Jan/15 16:15
8 kB
Benedict Elliott Smith

Issue Links

blocks

CASSANDRA-7396 Allow selecting Map values and Set elements

Resolved

CASSANDRA-8385 Clean up generics in uses of AbstractType

Open

CASSANDRA-6237 Allow range deletions in CQL

Resolved

CASSANDRA-8180 Optimize disk seek using min/max column name meta data when the LIMIT clause is used

Resolved

CASSANDRA-8424 Collection filtering not working when using PK

Resolved

breaks

CASSANDRA-9763 PartitionUpdate must sort() before returning rowCount()

Resolved

CASSANDRA-11026 OOM due to HeapByteBuffer instances

Resolved

incorporates

CASSANDRA-9888 BTreeBackedRow and ComplexColumnData

Resolved

CASSANDRA-9974 Improve debuggability

Resolved

CASSANDRA-9701 Enforce simple << complex sort order more strictly and efficiently

Resolved

is blocked by

CASSANDRA-8609 Remove depency of hadoop to internals (Cell/CellName)

Resolved

CASSANDRA-8946 Make SSTableScanner always respect its bound

Resolved

is depended upon by

CASSANDRA-8440 Refactor StorageProxy

Open

CASSANDRA-9471 Columns should be backed by a BTree, not an array

Resolved

CASSANDRA-9472 Reintroduce off heap memtables

Resolved

CASSANDRA-8809 Remove 'throws CassandraException'

Resolved

CASSANDRA-9473 Introduce BTreeSet with support for in-place reversal

Resolved

is duplicated by

CASSANDRA-8339 Reading columns marked as type different than default validation class from CQL causes errors

Resolved

CASSANDRA-8477 CMS GC can not recycle objects

Resolved

CASSANDRA-4987 Support more queries when ALLOW FILTERING is used.

Resolved

CASSANDRA-2986 Fix short reads in range (and index?) scans

Resolved

CASSANDRA-3024 sstable and message varint encoding

Resolved

CASSANDRA-6063 Rename internal classes and interfaces to represent the modern Cassandra terminology

Resolved

is required by

CASSANDRA-6412 Custom creation and merge functions for user-defined column types

Open

relates to

CASSANDRA-4175 Reduce memory, disk space, and cpu usage with a column name/id map

Resolved

supercedes

CASSANDRA-6915 Show storage rows in cqlsh

Resolved

CASSANDRA-5966 Average name query performance much worse for wide rows

Resolved

(2 breaks, 3 incorporates, 2 is blocked by, 5 is depended upon by, 6 is duplicated by, 1 is required by, 1 relates to, 2 supercedes)

Sub-Tasks

1.	On-wire backward compatibility for 8099	Resolved	Tom Hobbs
2.	Simplify some of 8099's concrete implementations	Resolved	Sylvain Lebresne
3.	system_auth_ks_is_alterable_test dtest fails on trunk	Resolved	Sam Tunnicliffe
4.	TestCommitLog segment size dtests fail on trunk	Resolved	Jim Witschey
5.	auth upgrade dtest fails	Resolved	Sam Tunnicliffe
6.	Most TTL dtests fail	Resolved	Sylvain Lebresne
7.	supercolumn/deletion thrift dtests fail	Resolved	Sylvain Lebresne
8.	dtest for many UPDATE batches, low contention fails on trunk	Resolved	Sylvain Lebresne
9.	some paging dtests fail/flap on trunk	Resolved	Sylvain Lebresne
10.	RangeTombstoneTest.testRowWithRangeTombstonesUpdatesSecondaryIndex failure	Resolved	Sylvain Lebresne
11.	CommitLogUpgradeTest.test{20,21} failure	Resolved	Branimir Lambov
12.	Archive commitlogs tests failing	Resolved	Ariel Weisberg

Activity

People

Assignee:: Sylvain Lebresne

Reporter:: Sylvain Lebresne

Authors:: Sylvain Lebresne

Reviewers:: Aleksey Yeschenko

Votes:: 3 Vote for this issue

Watchers:: 70 Start watching this issue

Dates

Created:: 10/Oct/14 14:30

Updated:: 25/Jun/19 04:32

Resolved:: 28/Aug/15 21:01