Cassandra
  1. Cassandra
  2. CASSANDRA-4150

Allow larger cache capacities than 2GB

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.1
    • Component/s: Core
    • Labels:
      None

      Description

      The problem is that capacity is a Integer which can maximum hold 2 GB,
      I will post a fix to CLHM in the mean time we might want to remove the maximumWeightedCapacity code path (atleast for Serializing cache) and implement it in our code.

        Activity

        Show
        Vijay added a comment - issue: http://code.google.com/p/concurrentlinkedhashmap/issues/detail?id=33 created.
        Hide
        Vijay added a comment - - edited

        For the records the comments from issue in CLHM library:

        I think a long capacity is fine, but I'm not actively working on a next release to roll this into soon. If this is critical than it could be a patch release. You are of course welcome to fork if neither of those options are okay.

        I helped my former colleagues at Google with Guava's CacheBuilder (formerly MapMaker), which could be considered the successor to this project. There the maximum weight is a long.

        IRC: Guava doesnt support descendingKeySetWithLimit
        Possibly fork the CLHM code into Cassandra code base or drop the hotkey's method and use guava (Thats the only limitation which i see for now).

        Show
        Vijay added a comment - - edited For the records the comments from issue in CLHM library: I think a long capacity is fine, but I'm not actively working on a next release to roll this into soon. If this is critical than it could be a patch release. You are of course welcome to fork if neither of those options are okay. I helped my former colleagues at Google with Guava's CacheBuilder (formerly MapMaker), which could be considered the successor to this project. There the maximum weight is a long. IRC: Guava doesnt support descendingKeySetWithLimit Possibly fork the CLHM code into Cassandra code base or drop the hotkey's method and use guava (Thats the only limitation which i see for now).
        Hide
        Vijay added a comment -

        CLHM (Ben) fixed the issue:

        Fixed in v1.3. I plan on releasing this tonight.

        Also introduced EntryWeigher<K, V> to allow key/value weighing. We fixed this oversight in Guava's CacheBuilder from the get-go.

        I believe Cassandra wanted entry weighers too, but it wasn't high priority (no bug filed). Please consider adopting it when you upgrade the library.

        Show
        Vijay added a comment - CLHM (Ben) fixed the issue: Fixed in v1.3. I plan on releasing this tonight. Also introduced EntryWeigher<K, V> to allow key/value weighing. We fixed this oversight in Guava's CacheBuilder from the get-go. I believe Cassandra wanted entry weighers too, but it wasn't high priority (no bug filed). Please consider adopting it when you upgrade the library.
        Hide
        Jonathan Ellis added a comment -

        Using MemoryMeter on a hot path like cache updates scares me a bit – it's pretty slow. Might want to switch to using the dataSize * liveRatio measurement that memtables use.

        Aside #1: So basically current 1.1 CLHM is totally broken since it has a capacity of X MB but weighs each item as one byte?

        Aside #2: I don't see any use of IRCP with useMemoryWeigher=false, looks like we can remove that parameter

        Show
        Jonathan Ellis added a comment - Using MemoryMeter on a hot path like cache updates scares me a bit – it's pretty slow. Might want to switch to using the dataSize * liveRatio measurement that memtables use. Aside #1: So basically current 1.1 CLHM is totally broken since it has a capacity of X MB but weighs each item as one byte? Aside #2: I don't see any use of IRCP with useMemoryWeigher=false, looks like we can remove that parameter
        Hide
        Jonathan Ellis added a comment -

        Aside #3: would prefer to assert value.size() < Integer.MAX_VALUE in SC.Weigher, instead of having a Math.min call in there. Strongly doubt the rest of the code would support serialized values larger than that anyway.

        Show
        Jonathan Ellis added a comment - Aside #3: would prefer to assert value.size() < Integer.MAX_VALUE in SC.Weigher, instead of having a Math.min call in there. Strongly doubt the rest of the code would support serialized values larger than that anyway.
        Hide
        Jonathan Ellis added a comment -

        So basically current 1.1 CLHM is totally broken since it has a capacity of X MB but weighs each item as one byte?

        (If that's the case, we should target this for 1.1.1.)

        Show
        Jonathan Ellis added a comment - So basically current 1.1 CLHM is totally broken since it has a capacity of X MB but weighs each item as one byte? (If that's the case, we should target this for 1.1.1.)
        Hide
        Vijay added a comment - - edited

        >>> Using MemoryMeter on a hot path like cache updates scares me a bit – it's pretty slow. Might want to switch to using the dataSize * liveRatio measurement that memtables use.
        Will do.

        >>> So basically current 1.1 CLHM is totally broken since it has a capacity of X MB but weighs each item as one byte?
        No we assume that the KeyCache to be 48 byte long average size (CacheService.AVERAGE_KEY_CACHE_ROW_SIZE)...
        basically, cache capacity = keyCacheInMemoryCapacity / AVERAGE_KEY_CACHE_ROW_SIZE
        hence each entry is considered to be 1 byte/one entry. (Kind of hackie may be we didn't have EntryWeigher i guess, now that we have it we can get rid of that logic).

        >>> I don't see any use of IRCP with useMemoryWeigher=false, looks like we can remove that parameter
        We still need it if we dont have Jamm Enabled.

        >>> would prefer to assert value.size() < Integer.MAX_VALUE in SC.Weigher
        Will do, Plz let me know if everything else is fine...

        >>>If that's the case, we should target this for 1.1.1
        Either ways we have to target for 1.1.1 because the serialized cache is broken

        Show
        Vijay added a comment - - edited >>> Using MemoryMeter on a hot path like cache updates scares me a bit – it's pretty slow. Might want to switch to using the dataSize * liveRatio measurement that memtables use. Will do. >>> So basically current 1.1 CLHM is totally broken since it has a capacity of X MB but weighs each item as one byte? No we assume that the KeyCache to be 48 byte long average size (CacheService.AVERAGE_KEY_CACHE_ROW_SIZE)... basically, cache capacity = keyCacheInMemoryCapacity / AVERAGE_KEY_CACHE_ROW_SIZE hence each entry is considered to be 1 byte/one entry. (Kind of hackie may be we didn't have EntryWeigher i guess, now that we have it we can get rid of that logic). >>> I don't see any use of IRCP with useMemoryWeigher=false, looks like we can remove that parameter We still need it if we dont have Jamm Enabled. >>> would prefer to assert value.size() < Integer.MAX_VALUE in SC.Weigher Will do, Plz let me know if everything else is fine... >>>If that's the case, we should target this for 1.1.1 Either ways we have to target for 1.1.1 because the serialized cache is broken
        Hide
        Vijay added a comment -

        The problem with the liveRatio is that it is variable and we might have a leak if we use the ratio...

        • Lets say when we add the entry we have a ratio of 1.2 and when the entry is removed we will have the ratio of 1.1 then we will leak some space for cache utilization. So it will be better if this number doesn't change.

        0001 removes memory meter and does changes to fix the issue.
        0002 adds entry weight and small refactor to change the Provider interface as suggested.

        Thanks!

        Show
        Vijay added a comment - The problem with the liveRatio is that it is variable and we might have a leak if we use the ratio... Lets say when we add the entry we have a ratio of 1.2 and when the entry is removed we will have the ratio of 1.1 then we will leak some space for cache utilization. So it will be better if this number doesn't change. 0001 removes memory meter and does changes to fix the issue. 0002 adds entry weight and small refactor to change the Provider interface as suggested. Thanks!
        Hide
        Jonathan Ellis added a comment -
        .       EntryWeigher<KeyCacheKey, RowIndexEntry> weigher = new EntryWeigher<KeyCacheKey, RowIndexEntry>()
                {
                    public int weightOf(KeyCacheKey key, RowIndexEntry value)
                    {
                        return key.serializedSize() + value.serializedSize();
                    }
                };
                ICache<KeyCacheKey, RowIndexEntry> kc = ConcurrentLinkedHashCache.create(keyCacheInMemoryCapacity, weigher);
        

        Using the serialized size for a non-serialized cache looks fishy to me.

        Show
        Jonathan Ellis added a comment - . EntryWeigher<KeyCacheKey, RowIndexEntry> weigher = new EntryWeigher<KeyCacheKey, RowIndexEntry>() { public int weightOf(KeyCacheKey key, RowIndexEntry value) { return key.serializedSize() + value.serializedSize(); } }; ICache<KeyCacheKey, RowIndexEntry> kc = ConcurrentLinkedHashCache.create(keyCacheInMemoryCapacity, weigher); Using the serialized size for a non-serialized cache looks fishy to me.
        Hide
        Vijay added a comment -

        Partially committed 0001 d to 1.1.1 and trunk, which will remove the limitation of 2 GB which this ticket was originally ment to do.
        for 0002 i am working on calculating the object overhead + size without using reflection. Thanks!

        Show
        Vijay added a comment - Partially committed 0001 d to 1.1.1 and trunk, which will remove the limitation of 2 GB which this ticket was originally ment to do. for 0002 i am working on calculating the object overhead + size without using reflection. Thanks!
        Hide
        Vijay added a comment -

        Before i go ahead with the complicated implementation (touching most of the cache values), i did a smoke test and thought of sharing the results.

        lgmac-vparthsarathy:cass vparthasarathy$ java -javaagent:/Users/vparthasarathy/Documents/workspace/cassandraT11/lib/jamm-0.2.5.jar -jar ~/Desktop/TestJamm.jar 100000000
        Using reflection took: 25954
        Using NativeCalculation took: 178
        Using MemoryMeter took: 992
        lgmac-vparthsarathy:cass vparthasarathy$

        I used https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/objectsize/ObjectSizeCalculator.java for reflection test.

        Show
        Vijay added a comment - Before i go ahead with the complicated implementation (touching most of the cache values), i did a smoke test and thought of sharing the results. lgmac-vparthsarathy:cass vparthasarathy$ java -javaagent:/Users/vparthasarathy/Documents/workspace/cassandraT11/lib/jamm-0.2.5.jar -jar ~/Desktop/TestJamm.jar 100000000 Using reflection took: 25954 Using NativeCalculation took: 178 Using MemoryMeter took: 992 lgmac-vparthsarathy:cass vparthasarathy$ I used https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/objectsize/ObjectSizeCalculator.java for reflection test.
        Hide
        Jonathan Ellis added a comment -

        Lets say when we add the entry we have a ratio of 1.2 and when the entry is removed we will have the ratio of 1.1 then we will leak some space for cache utilization. So it will be better if this number doesn't change

        I did some code diving into CLHM – it looks like it annotates the cache entry with the weight-at-put-time, and uses that in remove or replace. If so, I think MemoryMeter might be fine after all.

        OTOH if the cache isn't churning maybe MemoryMeter is just fine the way it is.

        I'm inclined to close this and just open a new ticket for updating to CLHM 1.3 in 1.2 and including keys in our weights. Thoughts?

        Show
        Jonathan Ellis added a comment - Lets say when we add the entry we have a ratio of 1.2 and when the entry is removed we will have the ratio of 1.1 then we will leak some space for cache utilization. So it will be better if this number doesn't change I did some code diving into CLHM – it looks like it annotates the cache entry with the weight-at-put-time, and uses that in remove or replace. If so, I think MemoryMeter might be fine after all. OTOH if the cache isn't churning maybe MemoryMeter is just fine the way it is. I'm inclined to close this and just open a new ticket for updating to CLHM 1.3 in 1.2 and including keys in our weights. Thoughts?
        Hide
        Vijay added a comment -

        just open a new ticket for updating to CLHM 1.3 in 1.2 and including keys in our weights. Thoughts?

        Done! opened CASSANDRA-4315... Thanks!

        Show
        Vijay added a comment - just open a new ticket for updating to CLHM 1.3 in 1.2 and including keys in our weights. Thoughts? Done! opened CASSANDRA-4315 ... Thanks!
        Hide
        Chris Burroughs added a comment -

        I'm having trouble following what happened from this ticket and 4315 ("just open a new ticket for updating to CLHM 1.3 in 1.2... "). Isn't 02672936f635c93e84ed6625bb994e1628da5a9b in the 1.1 branch and we already upgraded to CLHM 1.3?

        https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=02672936f635c93e84ed6625bb994e1628da5a9b

        Show
        Chris Burroughs added a comment - I'm having trouble following what happened from this ticket and 4315 ("just open a new ticket for updating to CLHM 1.3 in 1.2... "). Isn't 02672936f635c93e84ed6625bb994e1628da5a9b in the 1.1 branch and we already upgraded to CLHM 1.3? https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=02672936f635c93e84ed6625bb994e1628da5a9b
        Hide
        Vijay added a comment -

        Yes we did upgrade it to CLHM 1.3 in both Cassandra 1.1 and Cassandra 1.2 to fix the original issue mentioned in the ticket.

        Ben (author of CLHM) also added EntryWeight<K, V> to CLHM 1.3 on our request. Which is 0002 part of the patch originally submitted.
        To use EntryWeights we needed to calculate the memory size's (We where trying out multiple options using MemoryMeter or using other mechanisms) of the Key and Value, we where trying that in this ticket.

        Hence CASSANDRA-4315 was opened to continue the work of updating CLHM Weigher<V> to CLHM EntryWeight<K, V>.

        Show
        Vijay added a comment - Yes we did upgrade it to CLHM 1.3 in both Cassandra 1.1 and Cassandra 1.2 to fix the original issue mentioned in the ticket. Ben (author of CLHM) also added EntryWeight<K, V> to CLHM 1.3 on our request. Which is 0002 part of the patch originally submitted. To use EntryWeights we needed to calculate the memory size's (We where trying out multiple options using MemoryMeter or using other mechanisms) of the Key and Value, we where trying that in this ticket. Hence CASSANDRA-4315 was opened to continue the work of updating CLHM Weigher<V> to CLHM EntryWeight<K, V>.
        Hide
        Shawn Smith added a comment -

        It looks like the Maven dependencies in build.xml weren't updated to reference CLHM 1.3:

        ...
        <dependency groupId="com.googlecode.concurrentlinkedhashmap" artifactId="concurrentlinkedhashmap-lru" version="1.2"/>
        ...
        

        As a result, the Cassandra Maven Plugin blows up if you try to make it use Cassandra 1.1.1.

        ...
        [ERROR] 10:59:07,698 Exception encountered during startup
        [INFO] java.lang.NoSuchMethodError: com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Builder.maximumWeightedCapacity(J)Lcom/googlecode/concurrentlinkedhashmap/ConcurrentLinkedHashMap$Builder;
        [INFO]     at org.apache.cassandra.cache.ConcurrentLinkedHashCache.create(ConcurrentLinkedHashCache.java:70)
        [INFO]     at org.apache.cassandra.cache.ConcurrentLinkedHashCache.create(ConcurrentLinkedHashCache.java:70)
        ...
        
        Show
        Shawn Smith added a comment - It looks like the Maven dependencies in build.xml weren't updated to reference CLHM 1.3: ... <dependency groupId="com.googlecode.concurrentlinkedhashmap" artifactId="concurrentlinkedhashmap-lru" version="1.2"/> ... As a result, the Cassandra Maven Plugin blows up if you try to make it use Cassandra 1.1.1. ... [ERROR] 10:59:07,698 Exception encountered during startup [INFO] java.lang.NoSuchMethodError: com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Builder.maximumWeightedCapacity(J)Lcom/googlecode/concurrentlinkedhashmap/ConcurrentLinkedHashMap$Builder; [INFO] at org.apache.cassandra.cache.ConcurrentLinkedHashCache.create(ConcurrentLinkedHashCache.java:70) [INFO] at org.apache.cassandra.cache.ConcurrentLinkedHashCache.create(ConcurrentLinkedHashCache.java:70) ...
        Hide
        Vijay added a comment -

        Thanks! committed to 1.1 and trunk.

        Show
        Vijay added a comment - Thanks! committed to 1.1 and trunk.
        Hide
        David Tootill added a comment -

        The Maven plugin issue still exists with Cassandra 1.1.5

        Show
        David Tootill added a comment - The Maven plugin issue still exists with Cassandra 1.1.5
        Hide
        Jonathan Ellis added a comment -

        Feel free to attach a patch.

        Show
        Jonathan Ellis added a comment - Feel free to attach a patch.

          People

          • Assignee:
            Vijay
            Reporter:
            Vijay
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development