Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9264

Cassandra should not persist files without checksums

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • 5.x
    • Legacy/Core
    • None

    Description

      Even if checksums aren't validated on the read side every time it is helpful to have them persisted with checksums so that if a corrupted file is encountered you can at least validate that the issue is corruption and not an application level error that generated a corrupt file.

      We should standardize on conventions for how to checksum a file and which checksums to use so we can ensure we get the best performance possible.

      For a small checksum I think we should use CRC32 because the hardware support appears quite good.

      For cases where a 4-byte checksum is not enough I think we can look at either xxhash64 or MurmurHash3.

      The problem with xxhash64 is that output is only 8-bytes. The problem with MurmurHash3 is that the Java implementation is slow. If we can live with 8-bytes and make it easy to switch hash implementations I think xxhash64 is a good choice because we already ship a good implementation with LZ4.

      I would also like to see hashes always prefixed by a type so that we can swap hashes without running into pain trying to figure out what hash implementation is present. I would also like to avoid making assumptions about the number of bytes in a hash field where possible keeping in mind compatibility and space issues.

      Hashing after compression is also desirable over hashing before compression.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              aweisberg Ariel Weisberg
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: