Details
-
Bug
-
Status: Resolved
-
Low
-
Resolution: Fixed
-
None
-
Low
Description
In Scrubber, the instantiation of SSTableIdentityIterator hardcodes checkData to true. This should be made configurable when running scrub via JMX or StandaloneScrubber.
Since inbound data is not validated, Scrub without this option will throw away data that is not corrupt, but "misrepresented" (e.g. an int is stored but validator = LongType), while Cassandra and application clients will happily continue to read and write data with this misrepresentation (although some care may need to be taken on the application side). Scrub will throw these rows out leading to a large amount of data loss.
In these applications it is desirable for scrub to check for row/file corruption but not validate the column values (which can result in a large percentage of data being thrown away). This would be made possible by adding such a flag to disable validation in the SSTableIdentityIterator