Details
-
Improvement
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
Performance
-
Low Hanging Fruit
-
All
-
None
-
Description
During bulk writes, Cassandra Analytics calculates the MD5 checksum of every SSTable it produces. During SSTable upload to Cassandra Sidecar, Cassandra Analytics includes the content-md5 header as part of the upload request. This information is used by Cassandra Sidecar to validate the integrity of the uploaded SSTable and prevent issues with bit flips and corrupted SSTables.
Recently, Cassandra Sidecar introduced support for additional checksum validations during SSTable upload. Notably the XXHash32 digest support was added which offers for more performant checksum calculations. This support now allows Cassandra Analytics to use a more efficient digest algorithm that is friendlier on the CPU usage of Sidecar and spark resources.
Attachments
Issue Links
- links to