Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution. Let's see how much overhead we can save by using Murmur3 instead.

      1. try_murmur3.diff
        6 kB
        Dave Brosius
      2. try_murmur3_2.diff
        5 kB
        Dave Brosius
      3. hashed_partitioner.diff
        18 kB
        Dave Brosius
      4. MumPartitionerTest.docx
        126 kB
        Vijay
      5. hashed_partitioner_3.diff
        18 kB
        Dave Brosius
      6. 0001-CASSANDRA-3772-Test.patch
        3 kB
        Vijay
      7. 0001-CASSANDRA-3772.patch
        19 kB
        Vijay
      8. CASSANDRA-3772-v2.patch
        18 kB
        Pavel Yaskevich
      9. CASSANDRA-3772-v3.patch
        2 kB
        Pavel Yaskevich
      10. CASSANDRA-3772-v4.patch
        26 kB
        Pavel Yaskevich

        Issue Links

          Activity

          Hide
          Drew Kutcharian added a comment -

          It probably wouldn't be such a bad idea to also evaluate CityHash which is expected to be even faster than Murmur.

          http://code.google.com/p/cityhash/
          http://en.wikipedia.org/wiki/CityHash

          Show
          Drew Kutcharian added a comment - It probably wouldn't be such a bad idea to also evaluate CityHash which is expected to be even faster than Murmur. http://code.google.com/p/cityhash/ http://en.wikipedia.org/wiki/CityHash
          Hide
          Jonathan Ellis added a comment -

          I don't think there is a Java CityHash implementation yet.

          Show
          Jonathan Ellis added a comment - I don't think there is a Java CityHash implementation yet.
          Hide
          Drew Kutcharian added a comment -

          True. Not sure the Java implementation would be as fast since the C++ code uses CPU's CRC instructions to speed things up, don't know if that can be done using Java alone. If anything, this should probably be used via JNA.

          Show
          Drew Kutcharian added a comment - True. Not sure the Java implementation would be as fast since the C++ code uses CPU's CRC instructions to speed things up, don't know if that can be done using Java alone. If anything, this should probably be used via JNA.
          Hide
          Dave Brosius added a comment -

          Add an option
          hash_algorithm: murmur3|md5

          in the yaml, to specify the hash function.

          this requires an upgrade to guava to version 11.0.1, i changed the build.xml, but didn't include the jar in this patch.

          Show
          Dave Brosius added a comment - Add an option hash_algorithm: murmur3|md5 in the yaml, to specify the hash function. this requires an upgrade to guava to version 11.0.1, i changed the build.xml, but didn't include the jar in this patch.
          Hide
          Jonathan Ellis added a comment -

          Any preliminary results on performance improvements? (Even single node numbers would be interesting.)

          Show
          Jonathan Ellis added a comment - Any preliminary results on performance improvements? (Even single node numbers would be interesting.)
          Hide
          Dave Brosius added a comment - - edited

          Doing 1000 inserts of a 5 column CF on a single node cluster on a really lousy machine seems to show that the guava murmur hash is significantly slower than MD5. 5x? perhaps. Perhaps it's just the guava implementation, as opposed to the Murmur3 algorithm.

          Show
          Dave Brosius added a comment - - edited Doing 1000 inserts of a 5 column CF on a single node cluster on a really lousy machine seems to show that the guava murmur hash is significantly slower than MD5. 5x? perhaps. Perhaps it's just the guava implementation, as opposed to the Murmur3 algorithm.
          Hide
          Dave Brosius added a comment -

          Interestingly, timing just the hashing function itself shows very little difference (statistically irrelevant).

          Show
          Dave Brosius added a comment - Interestingly, timing just the hashing function itself shows very little difference (statistically irrelevant).
          Hide
          Jonathan Ellis added a comment -

          Hmm. You probably need to run at least 10k inserts through it first to make sure the JIT is warm, before timing things. Otherwise I would expect the (calling out to C) MD5 code to do better than (interpreted) Murmur3.

          Show
          Jonathan Ellis added a comment - Hmm. You probably need to run at least 10k inserts through it first to make sure the JIT is warm, before timing things. Otherwise I would expect the (calling out to C) MD5 code to do better than (interpreted) Murmur3.
          Hide
          Dave Brosius added a comment -

          With 10,000 inserts i'm seeing the same ratios, which i'm having a hard time describing why as again the hash function itself is about the same time.

          Show
          Dave Brosius added a comment - With 10,000 inserts i'm seeing the same ratios, which i'm having a hard time describing why as again the hash function itself is about the same time.
          Hide
          Jonathan Ellis added a comment -

          Yuki, can you take a look?

          Show
          Jonathan Ellis added a comment - Yuki, can you take a look?
          Hide
          Yuki Morishita added a comment -

          Dave,

          Patch needs rebase, but looking at the patch, I noticed the following:

          private static byte[] hashMurmur3(ByteBuffer... data)
          {
              HashFunction hashFunction = murmur3HF.get();
              Hasher hasher = hashFunction.newHasher();
              // snip
          }
          

          Isn't that slow if you instantiate every time? I looked up guava source code but I saw no way to "reset", so I guess the above is the only thing you could do...

          I also note that CASSANDRA-2975 will implement MurmurHash3, so I think it is better not to introduce external library. What do you think?

          Show
          Yuki Morishita added a comment - Dave, Patch needs rebase, but looking at the patch, I noticed the following: private static byte [] hashMurmur3(ByteBuffer... data) { HashFunction hashFunction = murmur3HF.get(); Hasher hasher = hashFunction.newHasher(); // snip } Isn't that slow if you instantiate every time? I looked up guava source code but I saw no way to "reset", so I guess the above is the only thing you could do... I also note that CASSANDRA-2975 will implement MurmurHash3, so I think it is better not to introduce external library. What do you think?
          Hide
          Jonathan Ellis added a comment -

          I looked up guava source code but I saw no way to "reset", so I guess the above is the only thing you could do

          It looks like you're right: http://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/hash/MessageDigestHashFunction.java

          So using the standalone MH3 library is probably the way to go.

          Show
          Jonathan Ellis added a comment - I looked up guava source code but I saw no way to "reset", so I guess the above is the only thing you could do It looks like you're right: http://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/hash/MessageDigestHashFunction.java So using the standalone MH3 library is probably the way to go.
          Hide
          Dave Brosius added a comment -

          I'd be glad to rework it, but didn't find this mysterious standalone MH3 library when i first was looking, and only found the guava version. Can you post a link as to where i should get this from?

          Show
          Dave Brosius added a comment - I'd be glad to rework it, but didn't find this mysterious standalone MH3 library when i first was looking, and only found the guava version. Can you post a link as to where i should get this from?
          Hide
          Vijay added a comment - - edited

          If CASSANDRA-2975 gets committed you should be able to use that.

          Edit: you can use MurmurHash.hash3_x64_128 function from 2975.

          Show
          Vijay added a comment - - edited If CASSANDRA-2975 gets committed you should be able to use that. Edit: you can use MurmurHash.hash3_x64_128 function from 2975.
          Hide
          Jonathan Ellis added a comment -

          (2975 did get committed.)

          Show
          Jonathan Ellis added a comment - (2975 did get committed.)
          Hide
          Dave Brosius added a comment - - edited

          new patch against trunk using MurmurHash.hash3_x64_128.

          Preliminary testing shows murmur3 hash to be marginally faster than md5, altho not significantly. (this is on very pedestrian hardware tho, so that might mask differences). Running longer tests now to see if jit has had a fair chance to do it's magic.

          Show
          Dave Brosius added a comment - - edited new patch against trunk using MurmurHash.hash3_x64_128. Preliminary testing shows murmur3 hash to be marginally faster than md5, altho not significantly. (this is on very pedestrian hardware tho, so that might mask differences). Running longer tests now to see if jit has had a fair chance to do it's magic.
          Hide
          Vijay added a comment -

          +1 running the test now and will commit it once successful (Without spaces in Config).

          Show
          Vijay added a comment - +1 running the test now and will commit it once successful (Without spaces in Config).
          Hide
          Dave Brosius added a comment - - edited

          this patch was only to evaluate... I think we would need to embellish this patch to save the setting in the system table, before committing this patch. Hmm, then again, it's a chicken and egg thing. You can't read the setting if the setting is wrong in the yaml.... kind of bad. Perhaps system table settings should always be hashed with MD5, and this setting only applies to user keyspaces.

          Show
          Dave Brosius added a comment - - edited this patch was only to evaluate... I think we would need to embellish this patch to save the setting in the system table, before committing this patch. Hmm, then again, it's a chicken and egg thing. You can't read the setting if the setting is wrong in the yaml.... kind of bad. Perhaps system table settings should always be hashed with MD5, and this setting only applies to user keyspaces.
          Hide
          Vijay added a comment -

          Well if we are worried about user mis-configuring.
          Option 1) We should probably be gossip about this version and fail the server at the startup.
          Option 2) We should make this a make this a keyspace setting and treat it like a comparator (Cannot be changed once KS is created).

          Show
          Vijay added a comment - Well if we are worried about user mis-configuring. Option 1) We should probably be gossip about this version and fail the server at the startup. Option 2) We should make this a make this a keyspace setting and treat it like a comparator (Cannot be changed once KS is created).
          Hide
          Jonathan Ellis added a comment -

          Why not just make a separate IPartitioner class so we don't need to add any special cases for the config?

          Show
          Jonathan Ellis added a comment - Why not just make a separate IPartitioner class so we don't need to add any special cases for the config?
          Hide
          Vijay added a comment -

          But implementing a IParitiner will have the same problem as the original option of leaving it as a config option, Where the user have to make sure he changes in all the servers at once (And he shouldn't be changing in the existing clusters which already has some data in it). I am ok with that too...

          Show
          Vijay added a comment - But implementing a IParitiner will have the same problem as the original option of leaving it as a config option, Where the user have to make sure he changes in all the servers at once (And he shouldn't be changing in the existing clusters which already has some data in it). I am ok with that too...
          Hide
          Jonathan Ellis added a comment -

          Right, "upgradeable" isn't a design goal here. Neither is "make partitioner changes without downtime."

          Show
          Jonathan Ellis added a comment - Right, "upgradeable" isn't a design goal here. Neither is "make partitioner changes without downtime."
          Hide
          Vijay added a comment -

          Hi Dave, mind writing a partitioner?

          Show
          Vijay added a comment - Hi Dave, mind writing a partitioner?
          Hide
          Dave Brosius added a comment -

          Sure... I will push the current RandomPartitioner code to an AbstractRandomPartitioner class and then subclass a new RandomPartitioner that uses md5 and a second subclass that does murmur3.

          Show
          Dave Brosius added a comment - Sure... I will push the current RandomPartitioner code to an AbstractRandomPartitioner class and then subclass a new RandomPartitioner that uses md5 and a second subclass that does murmur3.
          Hide
          Jeremy Hanna added a comment -

          Will be nice to see more benchmarks as people try this out. Changing the partitioner, as has been mentioned, is a significant change. So it will be nice to find out whether it is worth changing or whether it's just good for green field applications.

          Show
          Jeremy Hanna added a comment - Will be nice to see more benchmarks as people try this out. Changing the partitioner, as has been mentioned, is a significant change. So it will be nice to find out whether it is worth changing or whether it's just good for green field applications.
          Hide
          Dave Brosius added a comment -

          Moved existing RandomPartitioner code to AbstractHashedPartitioner, and added abstract hash method. Created two subclasses, MD5Partitioner and Murmur3Partitioner that extend and implement this method. Added back in a deprecated RandomPartitioner class that identity-derives from MD5Partitioner for backwards compatibility.

          patch against trunk

          Show
          Dave Brosius added a comment - Moved existing RandomPartitioner code to AbstractHashedPartitioner, and added abstract hash method. Created two subclasses, MD5Partitioner and Murmur3Partitioner that extend and implement this method. Added back in a deprecated RandomPartitioner class that identity-derives from MD5Partitioner for backwards compatibility. patch against trunk
          Hide
          Vijay added a comment -

          >>> Added back in a deprecated RandomPartitioner class that identity-derives from MD5Partitioner for backwards compatibility.
          Not sure about this:
          May be we should call the MD5Partitioner a random partitioner and not Deprecate it (Because we might not be able to remove this class for a long long time... Similar to OldNetworkTopologyStrategy)?

          BTW: Jeremy, I was not able to see any Much of a difference using the stress tool.

          Show
          Vijay added a comment - >>> Added back in a deprecated RandomPartitioner class that identity-derives from MD5Partitioner for backwards compatibility. Not sure about this: May be we should call the MD5Partitioner a random partitioner and not Deprecate it (Because we might not be able to remove this class for a long long time... Similar to OldNetworkTopologyStrategy)? BTW: Jeremy, I was not able to see any Much of a difference using the stress tool.
          Hide
          Jonathan Ellis added a comment -

          Agreed, I'd rather leave RandomPartitioner class name alone since people have been using it for 2+ years now, not worth the headache of renaming things at this point IMO.

          Show
          Jonathan Ellis added a comment - Agreed, I'd rather leave RandomPartitioner class name alone since people have been using it for 2+ years now, not worth the headache of renaming things at this point IMO.
          Hide
          Jonathan Ellis added a comment -

          Seriously bummed that we're not seeing stress wins. Didn't someone post profiler results fingering MD5 as a bottleneck? Maybe I am misremembering.

          Show
          Jonathan Ellis added a comment - Seriously bummed that we're not seeing stress wins. Didn't someone post profiler results fingering MD5 as a bottleneck? Maybe I am misremembering.
          Hide
          Sylvain Lebresne added a comment -

          Did you guys tested on secondary index reads (with reasonably high cardinality)? That's when MD5 was a bottleneck because then the column comparator ends up redecorating keys over and over again. The normal read/write path probably do only a handful of md5 computations, so I wouldn't be surprised this doesn't make much difference.

          Show
          Sylvain Lebresne added a comment - Did you guys tested on secondary index reads (with reasonably high cardinality)? That's when MD5 was a bottleneck because then the column comparator ends up redecorating keys over and over again. The normal read/write path probably do only a handful of md5 computations, so I wouldn't be surprised this doesn't make much difference.
          Hide
          Vijay added a comment -

          >>> Didn't someone post profiler results fingering MD5 as a bottleneck?
          I do have the profile where the md5 was a bottleneck earlier but after changing the bloom filter I am not sure.

          Hi Sylvain, Plz find the attachement it is a 3 node setup and i am doing very basic thing (including index scan). I can run let the secondary index test run longer if you want.

          Show
          Vijay added a comment - >>> Didn't someone post profiler results fingering MD5 as a bottleneck? I do have the profile where the md5 was a bottleneck earlier but after changing the bloom filter I am not sure. Hi Sylvain, Plz find the attachement it is a 3 node setup and i am doing very basic thing (including index scan). I can run let the secondary index test run longer if you want.
          Hide
          Sylvain Lebresne added a comment -

          I suppose that we'd want to tweak the stress params so that each indexed_slice call returns more results (so that each call end up doing more hash computation). However if we have to tune too much to get even a slight improvement, I wonder if it's worth bothering with that.

          Btw, did we micro-benchmarked our version of murmur3 against md5? How much faster is it?

          Show
          Sylvain Lebresne added a comment - I suppose that we'd want to tweak the stress params so that each indexed_slice call returns more results (so that each call end up doing more hash computation). However if we have to tune too much to get even a slight improvement, I wonder if it's worth bothering with that. Btw, did we micro-benchmarked our version of murmur3 against md5? How much faster is it?
          Hide
          Dave Brosius added a comment -

          rework to remove MD5Partitioner and just have Random and Murmur3 as comments suggested. Reworked Murmur3 to remove as many allocations as possible. Also just flipped the top bit (rather than abs) to avoid object allocs. Different then doing abs, but shouldn't effect distribution.

          Still isn't a significant difference from MD5 tho

          Show
          Dave Brosius added a comment - rework to remove MD5Partitioner and just have Random and Murmur3 as comments suggested. Reworked Murmur3 to remove as many allocations as possible. Also just flipped the top bit (rather than abs) to avoid object allocs. Different then doing abs, but shouldn't effect distribution. Still isn't a significant difference from MD5 tho
          Hide
          Vijay added a comment -

          micro benchmark shows a lot better performance

          testing size of: 200000
          Test MD5
          MD5 test completed @ 1506
          Test Murmur3
          Murmur3 test completed @ 781

          Hi Dave, while reviewing the patch it looks like
          Murmur3Partitioner.hash

          hashBytes[1] = (byte) (bufferLong >> 48);
          ...
          

          is kind of redundant to

          case 15: k2 ^= ((long) key.get(offset+14)) << 48
          ... 
          

          Though i dont think it is going to cause any additional latency

          Show
          Vijay added a comment - micro benchmark shows a lot better performance testing size of: 200000 Test MD5 MD5 test completed @ 1506 Test Murmur3 Murmur3 test completed @ 781 Hi Dave, while reviewing the patch it looks like Murmur3Partitioner.hash hashBytes[1] = ( byte ) (bufferLong >> 48); ... is kind of redundant to case 15: k2 ^= (( long ) key.get(offset+14)) << 48 ... Though i dont think it is going to cause any additional latency
          Hide
          Jonathan Ellis added a comment -

          Did you guys tested on secondary index reads (with reasonably high cardinality)? That's when MD5 was a bottleneck because then the column comparator ends up redecorating keys over and over again.

          Is that something we can address in the index code then, instead of creating a new partitioner?

          Show
          Jonathan Ellis added a comment - Did you guys tested on secondary index reads (with reasonably high cardinality)? That's when MD5 was a bottleneck because then the column comparator ends up redecorating keys over and over again. Is that something we can address in the index code then, instead of creating a new partitioner?
          Hide
          Sylvain Lebresne added a comment -

          Is that something we can address in the index code then, instead of creating a new partitioner?

          We could change the index code to write the token along with the key in the index rows, but it's unclear it would be a straight win since it means we'll do more I/O. It would also only work for new indexes, so we would have to keep compatibility with the old code and hence that's not hassle free either. Imho, if we do can find cases where md5 is a bottleneck, transitioning to a new murmur3 partitioned over time is cleaner.

          Show
          Sylvain Lebresne added a comment - Is that something we can address in the index code then, instead of creating a new partitioner? We could change the index code to write the token along with the key in the index rows, but it's unclear it would be a straight win since it means we'll do more I/O. It would also only work for new indexes, so we would have to keep compatibility with the old code and hence that's not hassle free either. Imho, if we do can find cases where md5 is a bottleneck, transitioning to a new murmur3 partitioned over time is cleaner.
          Hide
          Jonathan Ellis added a comment -

          Pavel, can you see if you can reproduce md5 bottlenecks in 2I reads, and see if this partitioner makes a meaningful difference?

          Show
          Jonathan Ellis added a comment - Pavel, can you see if you can reproduce md5 bottlenecks in 2I reads, and see if this partitioner makes a meaningful difference?
          Hide
          Radim Kolar added a comment -

          did you measured CPU time used during md5 vs murmur3 tests? Not only wall clock time.

          Show
          Radim Kolar added a comment - did you measured CPU time used during md5 vs murmur3 tests? Not only wall clock time.
          Hide
          Vijay added a comment -

          Made minor change format, added comments to yaml and modified Murmur3Partitioner to add

          public static void writeLong(long src, byte[] dest, int offset)
          

          I do see a bigger gain with faster disks.

          Show
          Vijay added a comment - Made minor change format, added comments to yaml and modified Murmur3Partitioner to add public static void writeLong( long src, byte [] dest, int offset) I do see a bigger gain with faster disks.
          Hide
          Pavel Yaskevich added a comment -

          My tests show that Murmur3Partitioner actually is worse than MD5 with high cardinality indexes, here is what I did (kernel 3.0.0-19, 2.2Ghz quad-core Opteron, 2GB RAM):

          For each test:

          • wiped all of the data directories and re-compiled with 'clean'
          • ran stress with -c 50 -C 500 -S 512 -n 50000 (where -c is number of columns, -C values cardinality and -S is value size in bytes) 4 times (to make it hot)

          RandomPartitioner: average op rate is 845.
          Murmur3Partitioner: average op rage is 721.

          Show
          Pavel Yaskevich added a comment - My tests show that Murmur3Partitioner actually is worse than MD5 with high cardinality indexes, here is what I did (kernel 3.0.0-19, 2.2Ghz quad-core Opteron, 2GB RAM): For each test: wiped all of the data directories and re-compiled with 'clean' ran stress with -c 50 -C 500 -S 512 -n 50000 (where -c is number of columns, -C values cardinality and -S is value size in bytes) 4 times (to make it hot) RandomPartitioner: average op rate is 845. Murmur3Partitioner: average op rage is 721.
          Hide
          Pavel Yaskevich added a comment -

          I have removed ThreadLocal declaration from the M3P (and cleaned whitespace errors) which was the bottleneck, after re-running tests with that modification M3P beats RP with 903 to 847.

          Show
          Pavel Yaskevich added a comment - I have removed ThreadLocal declaration from the M3P (and cleaned whitespace errors) which was the bottleneck, after re-running tests with that modification M3P beats RP with 903 to 847.
          Hide
          Radim Kolar added a comment -

          md5 is implemented using native call in Java?

          Show
          Radim Kolar added a comment - md5 is implemented using native call in Java?
          Hide
          Pavel Yaskevich added a comment -

          Java Cryptography Architecture doesn't disclose that but from the tests it looks like that it is.

          Show
          Pavel Yaskevich added a comment - Java Cryptography Architecture doesn't disclose that but from the tests it looks like that it is.
          Hide
          Vijay added a comment -

          +1

          Show
          Vijay added a comment - +1
          Hide
          Jonathan Ellis added a comment -

          That sounds worth it to me. Any downsides if we make MP the default for new clusters?

          Show
          Jonathan Ellis added a comment - That sounds worth it to me. Any downsides if we make MP the default for new clusters?
          Hide
          Pavel Yaskevich added a comment -

          I don't see any, as it has both good collision resistance and distribution.

          Show
          Pavel Yaskevich added a comment - I don't see any, as it has both good collision resistance and distribution.
          Hide
          Jonathan Ellis added a comment -

          Let's do it.

          Show
          Jonathan Ellis added a comment - Let's do it.
          Hide
          Pavel Yaskevich added a comment -

          Committed with M3P made the default.

          Show
          Pavel Yaskevich added a comment - Committed with M3P made the default.
          Hide
          Jonathan Ellis added a comment -

          Sorry, didn't look at the code until commit...

          Can you test making it hash to a Long or a 8-byte ByteBuffer? 16-byte BigInteger is overkill, all we need is a reasonable distribution (now that Tokens don't need to be unique) and 64 or even 32 bits is plenty for that.

          Show
          Jonathan Ellis added a comment - Sorry, didn't look at the code until commit... Can you test making it hash to a Long or a 8-byte ByteBuffer? 16-byte BigInteger is overkill, all we need is a reasonable distribution (now that Tokens don't need to be unique) and 64 or even 32 bits is plenty for that.
          Hide
          Pavel Yaskevich added a comment -

          Sure

          Show
          Pavel Yaskevich added a comment - Sure
          Hide
          Pavel Yaskevich added a comment -

          Attached patch to use first part of hash3_x64_128 (no copies into byte array) which shows better results than hash2_64. This approach ~18 op points better than previous.

          Show
          Pavel Yaskevich added a comment - Attached patch to use first part of hash3_x64_128 (no copies into byte array) which shows better results than hash2_64. This approach ~18 op points better than previous.
          Hide
          Vijay added a comment -

          +1 for this too!

          Show
          Vijay added a comment - +1 for this too!
          Hide
          Jonathan Ellis added a comment -

          Why not just use a Long instead of BigInteger?

          Show
          Jonathan Ellis added a comment - Why not just use a Long instead of BigInteger?
          Hide
          Pavel Yaskevich added a comment -

          Would have to re-implement all IPartitioner methods, make a M3P truly separate from RandomPartitioner, as we share BigInteger interface with it?

          Show
          Pavel Yaskevich added a comment - Would have to re-implement all IPartitioner methods, make a M3P truly separate from RandomPartitioner, as we share BigInteger interface with it?
          Hide
          Jonathan Ellis added a comment -

          Looks like it's worth prototyping, LongPartitioner should be very straightforward

          If there's no performance advantage though we can leave it out

          Show
          Jonathan Ellis added a comment - Looks like it's worth prototyping, LongPartitioner should be very straightforward If there's no performance advantage though we can leave it out
          Hide
          Pavel Yaskevich added a comment -

          What is the point of doing that? Making BigInteger from long is very straight-forward thing, doesn't give more overhead than converting to Long.

          Show
          Pavel Yaskevich added a comment - What is the point of doing that? Making BigInteger from long is very straight-forward thing, doesn't give more overhead than converting to Long.
          Hide
          Jonathan Ellis added a comment -

          Have a look at BigInteger source, it's about 3x as large as a Long. compareTo is also more complex.

          Show
          Jonathan Ellis added a comment - Have a look at BigInteger source, it's about 3x as large as a Long. compareTo is also more complex.
          Hide
          Pavel Yaskevich added a comment -

          v4 includes the following changes:

          • LongToken is added to o.a.c.dht
          • M3P is changed to use LongToken
          • AbstractHashedPartitioner is removed as no longer needed
          • RP was reverted to it's previous code state (before split into AHP and M3P)
          • NEWS.txt are updated to reflect change in default partitioner
          • added Murmur3PartitionerTest

          change to use longs in M3P buys us another ~6-7 op/s comparing to BigIntegerToken.

          Show
          Pavel Yaskevich added a comment - v4 includes the following changes: LongToken is added to o.a.c.dht M3P is changed to use LongToken AbstractHashedPartitioner is removed as no longer needed RP was reverted to it's previous code state (before split into AHP and M3P) NEWS.txt are updated to reflect change in default partitioner added Murmur3PartitionerTest change to use longs in M3P buys us another ~6-7 op/s comparing to BigIntegerToken.
          Hide
          Jonathan Ellis added a comment -

          Might be able to share a describeOwnership implementation b/t M3P and RP that deals with Token<Number>, but +1 from me.

          Show
          Jonathan Ellis added a comment - Might be able to share a describeOwnership implementation b/t M3P and RP that deals with Token<Number>, but +1 from me.
          Hide
          Pavel Yaskevich added a comment -

          Committed.

          Show
          Pavel Yaskevich added a comment - Committed.
          Hide
          Radim Kolar added a comment -

          Are there plans to turn optimized murmur3 hash into standalone library?

          Show
          Radim Kolar added a comment - Are there plans to turn optimized murmur3 hash into standalone library?
          Hide
          Jonathan Ellis added a comment -

          I'm not sure what you mean.

          Show
          Jonathan Ellis added a comment - I'm not sure what you mean.
          Hide
          Lex Lythius added a comment -

          Sorry to bring a skeleton back with bad news from the day after it was buried.

          Apparently, finding collisions for Murmur3 hash would be easy enough to attempt choking Cassandra.
          http://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/

          Just wanted to make sure you are aware of this.

          Show
          Lex Lythius added a comment - Sorry to bring a skeleton back with bad news from the day after it was buried. Apparently, finding collisions for Murmur3 hash would be easy enough to attempt choking Cassandra. http://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/ Just wanted to make sure you are aware of this.
          Hide
          Brandon Williams added a comment -

          It doesn't really matter if you can create M3 collisions, we only use it to find the node to store the data, the key sent and stored is still the actual key, not the hash. If you knew a token in the cluster the worst you could do is cause some (probably imperceptible) imbalance.

          Show
          Brandon Williams added a comment - It doesn't really matter if you can create M3 collisions, we only use it to find the node to store the data, the key sent and stored is still the actual key, not the hash. If you knew a token in the cluster the worst you could do is cause some (probably imperceptible) imbalance.
          Hide
          Lex Lythius added a comment -

          @Brandon Yes, I understand that we're talking about map bucket (hence node) collisions, not row collisions here. Still I wanted to bring this to your attention, if nothing else to disregard it as a non-issue.

          Would feeding masses of data to a particular bucket actually be imperceptible? I figured that could saturate a node, cause row/key cache issues, etc.

          Show
          Lex Lythius added a comment - @Brandon Yes, I understand that we're talking about map bucket (hence node) collisions, not row collisions here. Still I wanted to bring this to your attention, if nothing else to disregard it as a non-issue. Would feeding masses of data to a particular bucket actually be imperceptible? I figured that could saturate a node, cause row/key cache issues, etc.
          Hide
          Jonathan Ellis added a comment -

          If you're exposing Cassandra directly to attackers there are easier ways to DOS it.

          Show
          Jonathan Ellis added a comment - If you're exposing Cassandra directly to attackers there are easier ways to DOS it.
          Hide
          Lex Lythius added a comment -

          Right. I was thinking some web scenarios using tables with non-UUID keys and not much screening of input.
          Anyway...

          Show
          Lex Lythius added a comment - Right. I was thinking some web scenarios using tables with non-UUID keys and not much screening of input. Anyway...

            People

            • Assignee:
              Pavel Yaskevich
              Reporter:
              Jonathan Ellis
              Reviewer:
              Vijay
            • Votes:
              2 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development