[CASSANDRA-10306] Splitting SSTables in time, deleting and archiving SSTables - ASF JIRA

Details

Type: Improvement
Status: Resolved
Priority: Normal
Resolution: Duplicate
Fix Version/s: None
Component/s: None
Labels:
- dtcs

Description

This document is a continuation for CASSANDRA-10195 and describes some needs to be able split files in time wise as discussed also in CASSANDRA-8361. Data model is explained shortly, then the practical issues running Cassandra with time series data and needs for the splitting capabilities.

Data model: (snippet from CASSANDRA-9644
Data is time series data. Data is saved so that one row contains a certain time span of data for a given metric ( 20 days in this case). The row key contains information about the start time of the time span and metrix name. Column name gives the offset from the beginning of time span. Column time stamp is set to correspond time stamp when adding together the timestamp from the row key and the offset (the actual time stamp of data point). Data model is analog to KairosDB implementation.

In the practical application the data is added to real-time into the column family. While converting from legacy system old data is pre-loaded in timely order by faking the timestamp of the column before starting the real-time data collection. However, there is intermittently a need to insert also older data to the database due to the fact that is has not been available in real-time or additional time series are fed in afterward due to unforeseeable needs.

Adding old data simultaneously with real-time data will lead to SSTables that are containing data from a time period exceeding the length of the compaction window (TWCS and DTCS). Therefore SSTables are not behaving in predictable manner in compaction process.

Tombstones are masking the data from queries but the release of disk space requires that SStables containing tombstones would be compacted together with SSTables having the original data. While using TWCS or DTCS and writing tombstones with timestamp corresponding the real time SStables containing the original data will not end up to be compacted with SSTables having the tombstone. Even if writing tombstones by faking the timestamps the SSTable should be written apart from the on-going real-time data. Otherwise the SSTables have to be splitted (see later).

TTL is a working method to delete data from column family and releasing disk space in a predictable manner. However, setting the correct TTL is not a trivial task. Required TTL might change e.g. due to legislation or the customer would like to have a longer lifetime for the data.

The other factor affecting the disk space consumption is the variability of the rate how much data is fed to the column family. In certain troubleshooting cases the sample rate can be increased ten fold for a large portion of collected time series. This will lead to rapid consumption of disk space and old data has to be deleted / archived in a such manner that disk space will be released in a quick and predictable manner.

Losing one or more nodes from the cluster and not having a spare hardware will also lead to a situation that data from the lost node has to be replicated again for the remaining nodes. This will lead to increased disk space consumption per node and probably requires some cleaning of older data away from the active column family.

All of the above issues could be of course handled just by adding more disk space or nodes to the cluster. In the cloud environment that would a feasible option. In the application sitting in real hardware in isolated environment this is not a feasible solution due to practical reasons or due to costs. Getting new hardware on sites might take a long time e.g. due to custom regulations.

In the application domain (time series data collection) the data is not modified after inserting to the column family. There will be only read operations and deletion / archiving of old data based on the TTL or operator actions.

The above reasoning will lead to following conclusions and proposals.

TWCS and DTCS (with certain modifications) are leading to a well structured SSTables where tables are organized in timely manner giving opportunities to manage available disk capacity on nodes. Recovering from repairs works also (compaction the flood of small SSTables with larger ones).
Being able to effectively split the SStables along a given time line would lead to SSTable sets on all nodes that would allow deletion or archiving SSTables. What would be the mechanism to inactivate SSTables during deletion / archiving so that nodes don’t start streaming “missing” data between nodes (repairs)?
Being able to split existing SSTables along multiple timelines determined by TWCS would allow insertion of older data to the column family that would eventually be compacted in desired manner in correct time window. Original SSTable would be streamed to several SStables according to time windows. In the end empty SSTables would be discarded.
Splitting action would be a tool to be executed through the nodetool command when needed.

Attachments

Issue Links

duplicates

CASSANDRA-10496 Make DTCS/TWCS split partitions based on time during compaction

Open

Splitting SSTables in time, deleting and archiving SSTables

Details

Description

Attachments

Issue Links

Activity

People

Dates