Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10650

Use Murmur3 hashing instead of MD5 in SkimpyOffsetMap

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • core

    Description

      The usage of MD5 has been uncovered during testing Kafka for FIPS (Federal Information Processing Standards) verification.

      While MD5 isn't a FIPS incompatibility here as it isn't used for cryptographic purposes, I spent some time with this as it isn't ideal either. MD5 is a relatively fast crypto hashing algo but there are much better performing algorithms for hash tables as it's used in SkimpyOffsetMap.

      By applying Murmur3 (that is implemented in Streams) I could achieve a 3x faster put operation and the overall segment cleaning sped up by 30% while preserving the same collision rate (both performed within 0.0015 - 0.007, mostly with 0.004 median).

      The usage of Murmur3 was decided as research paper [1] shows Murmur2 is relatively a good choice for hash tables. Based on this Since Murmur3 is available in the project I used that.

      [1]
      https://www.researchgate.net/publication/235663569_Performance_of_the_most_common_non-cryptographic_hash_functions

      Benchmark evidence (the smaller the better as this is average time):

      The benchmark can be reproduced by running ./jmh.sh LogCleanerBenchmark from https://github.com/viktorsomogyi/kafka/tree/KAFKA-3987-hash-algorithm-murmur in the jmh-benchmark folder.

      Attachments

        1. benchmark-evidence.png
          331 kB
          Viktor Somogyi-Vass
        2. benchmark-run-output
          660 kB
          Viktor Somogyi-Vass

        Issue Links

          Activity

            People

              viktorsomogyi Viktor Somogyi-Vass
              viktorsomogyi Viktor Somogyi-Vass
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: