Cassandra
  1. Cassandra
  2. CASSANDRA-3997

Make SerializingCache Memory Pluggable

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 2.0 beta 1
    • Component/s: Core
    • Labels:

      Description

      Serializing cache uses native malloc and free by making FM pluggable, users will have a choice of gcc malloc, TCMalloc or JEMalloc as needed.
      Initial tests shows less fragmentation in JEMalloc but the only issue with it is that (both TCMalloc and JEMalloc) are kind of single threaded (at-least they crash in my test otherwise).

      1. jna.zip
        6 kB
        Vijay
      2. 0001-CASSANDRA-3997.patch
        23 kB
        Vijay
      3. 0001-CASSANDRA-3997-v2.patch
        23 kB
        Vijay
      4. 0001-CASSANDRA-3997-v3.patch
        24 kB
        Vijay
      5. 0001-CASSANDRA-3997-v4.patch
        13 kB
        Vijay

        Activity

        Hide
        Vijay added a comment - - edited

        Attached is the test classes used for the test.

        Results on CentOS:

        [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.MallocAllocator 50000 2000000
                     total       used       free     shared    buffers     cached
        Mem:      71688220   26049380   45638840          0     169116     996172
        -/+ buffers/cache:   24884092   46804128
        Swap:            0          0          0
        **** Starting Test! ****
        Total bytes read: 101422934016
        Time taken: 25407
                     total       used       free     shared    buffers     cached
        Mem:      71688220   31981924   39706296          0     169116     996312
        -/+ buffers/cache:   30816496   40871724
        Swap:            0          0          0
        **** ending Test!**** 
        [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=/usr/local/lib/
        [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.TCMallocAllocator 50000 2000000
                     total       used       free     shared    buffers     cached
        Mem:      71688220   26054620   45633600          0     169128     996228
        -/+ buffers/cache:   24889264   46798956
        Swap:            0          0          0
        **** Starting Test! ****
        Total bytes read: 101304894464
        Time taken: 46387
                     total       used       free     shared    buffers     cached
        Mem:      71688220   28535136   43153084          0     169128     996436
        -/+ buffers/cache:   27369572   44318648
        Swap:            0          0          0
        **** ending Test!**** 
        [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=~/jemalloc-2.2.5/lib/ 
        [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=~/jemalloc-2.2.5/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.JEMallocAllocator 50000 2000000
                     total       used       free     shared    buffers     cached
        Mem:      71688220   26060604   45627616          0     169128     996300
        -/+ buffers/cache:   24895176   46793044
        Swap:            0          0          0
        **** Starting Test! ****
        Total bytes read: 101321734144
        Time taken: 29937
                     total       used       free     shared    buffers     cached
        Mem:      71688220   28472436   43215784          0     169128     996440
        -/+ buffers/cache:   27306868   44381352
        Swap:            0          0          0
        **** ending Test!**** 
        [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ 
        
        

        The test shows around 4 GB savings. The test was on 101321734144 bytes (101 GB each). The test use CLHM to hold on to the objects and release them when the capacity is reached (5K)

        Show
        Vijay added a comment - - edited Attached is the test classes used for the test. Results on CentOS: [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.MallocAllocator 50000 2000000 total used free shared buffers cached Mem: 71688220 26049380 45638840 0 169116 996172 -/+ buffers/cache: 24884092 46804128 Swap: 0 0 0 **** Starting Test! **** Total bytes read: 101422934016 Time taken: 25407 total used free shared buffers cached Mem: 71688220 31981924 39706296 0 169116 996312 -/+ buffers/cache: 30816496 40871724 Swap: 0 0 0 **** ending Test!**** [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=/usr/local/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.TCMallocAllocator 50000 2000000 total used free shared buffers cached Mem: 71688220 26054620 45633600 0 169128 996228 -/+ buffers/cache: 24889264 46798956 Swap: 0 0 0 **** Starting Test! **** Total bytes read: 101304894464 Time taken: 46387 total used free shared buffers cached Mem: 71688220 28535136 43153084 0 169128 996436 -/+ buffers/cache: 27369572 44318648 Swap: 0 0 0 **** ending Test!**** [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=~/jemalloc-2.2.5/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=~/jemalloc-2.2.5/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.JEMallocAllocator 50000 2000000 total used free shared buffers cached Mem: 71688220 26060604 45627616 0 169128 996300 -/+ buffers/cache: 24895176 46793044 Swap: 0 0 0 **** Starting Test! **** Total bytes read: 101321734144 Time taken: 29937 total used free shared buffers cached Mem: 71688220 28472436 43215784 0 169128 996440 -/+ buffers/cache: 27306868 44381352 Swap: 0 0 0 **** ending Test!**** [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ The test shows around 4 GB savings. The test was on 101321734144 bytes (101 GB each). The test use CLHM to hold on to the objects and release them when the capacity is reached (5K)
        Hide
        Jonathan Ellis added a comment -

        I don't understand the goal here. What is the value in allowing the use of malloc implementations that cause segfaults?

        Show
        Jonathan Ellis added a comment - I don't understand the goal here. What is the value in allowing the use of malloc implementations that cause segfaults?
        Hide
        Jonathan Ellis added a comment -

        Does this preserve the ability to use Unsafe instead of a JNA-backed malloc?

        Show
        Jonathan Ellis added a comment - Does this preserve the ability to use Unsafe instead of a JNA-backed malloc?
        Hide
        Vijay added a comment -

        Hi Jonathan, The good thing is that it saves us from the Memory Fragmentation which we have seen with the native malloc's after it runs for a prolonge period of time. If the user wants to use a different implementation they can use it. JEMAlloc hasn't segfault in any of my tests I think it is better to use we still need to do more tests per sure.

        >>> Does this preserve the ability to use Unsafe instead of a JNA-backed malloc?
        No, the ticket just makes it pluggable so any other implementation is possible. We dont need to build the *.so/dll's for every environment we come across and the unsafe can be default

        Show
        Vijay added a comment - Hi Jonathan, The good thing is that it saves us from the Memory Fragmentation which we have seen with the native malloc's after it runs for a prolonge period of time. If the user wants to use a different implementation they can use it. JEMAlloc hasn't segfault in any of my tests I think it is better to use we still need to do more tests per sure. >>> Does this preserve the ability to use Unsafe instead of a JNA-backed malloc? No, the ticket just makes it pluggable so any other implementation is possible. We dont need to build the *.so/dll's for every environment we come across and the unsafe can be default
        Hide
        Jonathan Ellis added a comment -

        JEMAlloc hasn't segfault in any of my tests

        What did you mean then by "both TCMalloc and JEMalloc are kind of single threaded (at-least they crash in my test otherwise)?"

        Show
        Jonathan Ellis added a comment - JEMAlloc hasn't segfault in any of my tests What did you mean then by "both TCMalloc and JEMalloc are kind of single threaded (at-least they crash in my test otherwise)?"
        Hide
        Vijay added a comment - - edited

        Ohhh sorry for the confusion.
        JEMAlloc's case: The Malloc/Free should done by ANY one thread at a time. The test had 100 Threads doing malloc/free but only one will actually malloc/free at a time and the "Time taken" shows the raw speed.
        TCMalloc's case: Only one thread should be malloc and doing free. (Even after this it was crashing randomly because of illegal memory access, hence i said JEMalloc hasnt crashed).

        The test code does exactly the above.... The implementation should deal with it and avoid contending for malloc and free with multiple threads. Once we deal with it, it works well.

        Show
        Vijay added a comment - - edited Ohhh sorry for the confusion. JEMAlloc's case: The Malloc/Free should done by ANY one thread at a time. The test had 100 Threads doing malloc/free but only one will actually malloc/free at a time and the "Time taken" shows the raw speed. TCMalloc's case: Only one thread should be malloc and doing free. (Even after this it was crashing randomly because of illegal memory access, hence i said JEMalloc hasnt crashed). The test code does exactly the above.... The implementation should deal with it and avoid contending for malloc and free with multiple threads. Once we deal with it, it works well.
        Hide
        Jonathan Ellis added a comment -

        The Malloc/Free should done by ANY one thread at a time

        So should we just synchronize the JEmalloc allocator methods instead of leaving it to the caller?

        I'm fine with turning this into a real patch adding a cache allocator option, although from your test results it doesn't look like it's worth shipping TEmalloc – more fragmentation than JE and substantially slower.

        Show
        Jonathan Ellis added a comment - The Malloc/Free should done by ANY one thread at a time So should we just synchronize the JEmalloc allocator methods instead of leaving it to the caller? I'm fine with turning this into a real patch adding a cache allocator option, although from your test results it doesn't look like it's worth shipping TEmalloc – more fragmentation than JE and substantially slower.
        Hide
        Vijay added a comment -

        Agree and will do.

        Show
        Vijay added a comment - Agree and will do.
        Hide
        Vijay added a comment -

        Attached patch makes the Offheap allocation pluggable and has JEMallocAllocator.

        To Test JEMalloc: Plz set

        #export LD_LIBRARY_PATH=/xxx/jemalloc-2.2.5/lib/
        JVM Property: -Djava.library.path=/xxx/jemalloc-2.2.5/lib/libjemalloc.so

        Show
        Vijay added a comment - Attached patch makes the Offheap allocation pluggable and has JEMallocAllocator. To Test JEMalloc: Plz set #export LD_LIBRARY_PATH=/xxx/jemalloc-2.2.5/lib/ JVM Property: -Djava.library.path=/xxx/jemalloc-2.2.5/lib/libjemalloc.so
        Hide
        Jonathan Ellis added a comment -

        To update from IRC:

        Vijay said he tried jemalloc via LD_PRELOAD and hit corruption problems related to multi-threaded use. Which is odd, because http://www.canonware.com/jemalloc/, http://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919, and others claim jemalloc is designed for multithreaded use cases.

        (If there ARE thread safety problems w/ jemalloc, shouldn't the patch synchronize somewhere?)

        I did find http://comments.gmane.org/gmane.comp.db.redis.general/7736 which corroborates "weird problems w/ jemalloc and the jvm," although throwing redis into the mix definitely doesn't simplify things.

        I think this is worth following up on for two reasons:

        1. I'd much rather let people customize malloc via preload, than having to write Java + JNA stubs for each
        2. If there are corruption problems with jemalloc, it could affect our cache too

        Can you follow up on http://www.canonware.com/mailman/listinfo/jemalloc-discuss and see if they can clear anything up?

        Show
        Jonathan Ellis added a comment - To update from IRC: Vijay said he tried jemalloc via LD_PRELOAD and hit corruption problems related to multi-threaded use. Which is odd, because http://www.canonware.com/jemalloc/ , http://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919 , and others claim jemalloc is designed for multithreaded use cases. (If there ARE thread safety problems w/ jemalloc, shouldn't the patch synchronize somewhere?) I did find http://comments.gmane.org/gmane.comp.db.redis.general/7736 which corroborates "weird problems w/ jemalloc and the jvm," although throwing redis into the mix definitely doesn't simplify things. I think this is worth following up on for two reasons: I'd much rather let people customize malloc via preload, than having to write Java + JNA stubs for each If there are corruption problems with jemalloc, it could affect our cache too Can you follow up on http://www.canonware.com/mailman/listinfo/jemalloc-discuss and see if they can clear anything up?
        Hide
        Vijay added a comment - - edited

        Have an update:

        Jason Evan says: "LD_PRELOAD'ing jemalloc should be okay as long as the JVM doesn't statically link a different malloc implementation. I expect that if it isn't safe, you'll experience crashes quite early on, so give it a try and see what happens."

        I have also conformed the unsafe isn't statically linked to native Malloc by adding a printf in the malloc c code which basically count's the number of times it is called. Looks like PRELOAD is a better option. I am running a long running test and will close this ticket once it is successful. Thanks!

        Show
        Vijay added a comment - - edited Have an update: Jason Evan says: "LD_PRELOAD'ing jemalloc should be okay as long as the JVM doesn't statically link a different malloc implementation. I expect that if it isn't safe, you'll experience crashes quite early on, so give it a try and see what happens." I have also conformed the unsafe isn't statically linked to native Malloc by adding a printf in the malloc c code which basically count's the number of times it is called. Looks like PRELOAD is a better option. I am running a long running test and will close this ticket once it is successful. Thanks!
        Hide
        Pavel Yaskevich added a comment -

        Vijay, can you please also test Hoard Memory Allocator (http://www.hoard.org/) as a comparison to jemalloc?

        Show
        Pavel Yaskevich added a comment - Vijay, can you please also test Hoard Memory Allocator ( http://www.hoard.org/ ) as a comparison to jemalloc?
        Hide
        Vijay added a comment -

        >>> Vijay, can you please also test Hoard Memory Allocator (http://www.hoard.org/) as a comparison to jemalloc?
        Sure, but looks like LD_PRELOAD causes crashes due to various reasons and doesn't sound like a good solution. I will check in the mailing list about the same.

        [vijay_tcasstest@vijay_tcass-i-a91ee8cd crash]$ grep -A2 Problematic *
        hs_err_pid1309.log:# Problematic frame:
        hs_err_pid1309.log-# C [libjemalloc.so+0xb79a]
        hs_err_pid1309.log-#

        hs_err_pid1622.log:# Problematic frame:
        hs_err_pid1622.log-# C [libjemalloc.so+0x57f4] free+0x54
        hs_err_pid1622.log-#

        hs_err_pid16902.log:# Problematic frame:
        hs_err_pid16902.log-# C [libjemalloc.so+0xb79a]
        hs_err_pid16902.log-[error occurred during error reporting (printing problematic frame), id 0xb]

        hs_err_pid16902.log:# Problematic frame:
        hs_err_pid16902.log-# C [libjemalloc.so+0xb79a]
        hs_err_pid16902.log-[error occurred during error reporting (printing problematic frame), id 0xb]

        hs_err_pid29892.log:# Problematic frame:
        hs_err_pid29892.log-# C [libjemalloc.so+0xb79a]
        hs_err_pid29892.log-#

        hs_err_pid30273.log:# Problematic frame:
        hs_err_pid30273.log-# C [libjemalloc.so+0xb79a]
        hs_err_pid30273.log-#

        hs_err_pid30645.log:# Problematic frame:
        hs_err_pid30645.log-# C [libjemalloc.so+0xb79a]
        hs_err_pid30645.log-#

        hs_err_pid4037.log:# Problematic frame:
        hs_err_pid4037.log-# C [libjemalloc.so+0xb79a]
        hs_err_pid4037.log-[error occurred during error reporting (printing problematic frame), id 0xb]

        hs_err_pid7733.log:# Problematic frame:
        hs_err_pid7733.log-# C [libc.so.6+0x618a2]
        hs_err_pid7733.log-[error occurred during error reporting (printing problematic frame), id 0xb]

        hs_err_pid7733.log:# Problematic frame:
        hs_err_pid7733.log-# C [libc.so.6+0x618a2]
        hs_err_pid7733.log-[error occurred during error reporting (printing problematic frame), id 0xb]

        Show
        Vijay added a comment - >>> Vijay, can you please also test Hoard Memory Allocator ( http://www.hoard.org/ ) as a comparison to jemalloc? Sure, but looks like LD_PRELOAD causes crashes due to various reasons and doesn't sound like a good solution. I will check in the mailing list about the same. [vijay_tcasstest@vijay_tcass-i-a91ee8cd crash] $ grep -A2 Problematic * hs_err_pid1309.log:# Problematic frame: hs_err_pid1309.log-# C [libjemalloc.so+0xb79a] hs_err_pid1309.log-# – hs_err_pid1622.log:# Problematic frame: hs_err_pid1622.log-# C [libjemalloc.so+0x57f4] free+0x54 hs_err_pid1622.log-# – hs_err_pid16902.log:# Problematic frame: hs_err_pid16902.log-# C [libjemalloc.so+0xb79a] hs_err_pid16902.log- [error occurred during error reporting (printing problematic frame), id 0xb] – hs_err_pid16902.log:# Problematic frame: hs_err_pid16902.log-# C [libjemalloc.so+0xb79a] hs_err_pid16902.log- [error occurred during error reporting (printing problematic frame), id 0xb] – hs_err_pid29892.log:# Problematic frame: hs_err_pid29892.log-# C [libjemalloc.so+0xb79a] hs_err_pid29892.log-# – hs_err_pid30273.log:# Problematic frame: hs_err_pid30273.log-# C [libjemalloc.so+0xb79a] hs_err_pid30273.log-# – hs_err_pid30645.log:# Problematic frame: hs_err_pid30645.log-# C [libjemalloc.so+0xb79a] hs_err_pid30645.log-# – hs_err_pid4037.log:# Problematic frame: hs_err_pid4037.log-# C [libjemalloc.so+0xb79a] hs_err_pid4037.log- [error occurred during error reporting (printing problematic frame), id 0xb] – hs_err_pid7733.log:# Problematic frame: hs_err_pid7733.log-# C [libc.so.6+0x618a2] hs_err_pid7733.log- [error occurred during error reporting (printing problematic frame), id 0xb] – hs_err_pid7733.log:# Problematic frame: hs_err_pid7733.log-# C [libc.so.6+0x618a2] hs_err_pid7733.log- [error occurred during error reporting (printing problematic frame), id 0xb]
        Hide
        Vijay added a comment -

        Hi Pavel,
        Looks like howard malloc can work seamlessly with LD_PRELOAD but JEmalloc doesn't work well with LD_PRELOAD.

        Jason says: The crash in free() is the only one that tells me anything at all, and my only guesses are 1) mixed allocator usage or 2) application error, e.g. double free(). I really don't know anything about how the JVM is structured internally , how it interacts with malloc, how it uses/abuses dlopen(), etc., so I'm not going to be of much help without a lot more background information.

        In other hand the attached patch avoids the crashes.

        Comparision of the mallocs:

        [vijay_tcasstest@vijay_tcass-i-08e1f16c java]$ java -cp /apps/nfcassandra_server/lib/concurrentlinkedhashmap-lru-1.2.jar:/apps/nfcassandra_server/lib/jna-3.3.0.jar:/apps/nfcassandra_server/lib/apache-cassandra-1.1.0-beta2-SNAPSHOT.jar:. com.sun.jna.MallocAllocator 50000 2000000
                     total       used       free     shared    buffers     cached
        Mem:      71688220   10569792   61118428          0     146360    1864972
        -/+ buffers/cache:    8558460   63129760
        Swap:            0          0          0
        **** Starting Test! ****
        Total bytes read: 101423216640
        Time taken: 28587
                     total       used       free     shared    buffers     cached
        Mem:      71688220   15950408   55737812          0     146360    1865184
        -/+ buffers/cache:   13938864   57749356
        Swap:            0          0          0
        **** ending Test!**** 
        
        [vijay_tcasstest@vijay_tcass-i-08e1f16c java]$ export LD_LIBRARY_PATH=/home/vijay_tcasstest/howard/
        [vijay_tcasstest@vijay_tcass-i-08e1f16c java]$ java -Djava.library.path=/home/vijay_tcasstest/howard/ -cp /apps/nfcassandra_server/lib/concurrentlinkedhashmap-lru-1.2.jar:/apps/nfcassandra_server/lib/jna-3.3.0.jar:/apps/nfcassandra_server/lib/apache-cassandra-1.1.0-beta2-SNAPSHOT.jar:. com.sun.jna.HowardMallocAllocator 50000 2000000
                     total       used       free     shared    buffers     cached
        Mem:      71688220   10573476   61114744          0     146320    1864972
        -/+ buffers/cache:    8562184   63126036
        Swap:            0          0          0
        **** Starting Test! ****
        Total bytes read: 101366196224
        Time taken: 33959
                     total       used       free     shared    buffers     cached
        Mem:      71688220   16292664   55395556          0     146320    1865184
        -/+ buffers/cache:   14281160   57407060
        Swap:            0          0          0
        **** ending Test!**** 
        
        [vijay_tcasstest@vijay_tcass-i-08e1f16c java]$ export LD_LIBRARY_PATH=/home/vijay_tcasstest/jemalloc/lib/
        [vijay_tcasstest@vijay_tcass-i-08e1f16c java]$ java -Djava.library.path=/home/vijay_tcasstest/jemalloc/lib/ -cp /apps/nfcassandra_server/lib/concurrentlinkedhashmap-lru-1.2.jar:/apps/nfcassandra_server/lib/jna-3.3.0.jar:/apps/nfcassandra_server/lib/apache-cassandra-1.1.0-beta2-SNAPSHOT.jar:. com.sun.jna.JEMallocAllocator 50000 2000000
                     total       used       free     shared    buffers     cached
        Mem:      71688220   10572896   61115324          0     146332    1864972
        -/+ buffers/cache:    8561592   63126628
        Swap:            0          0          0
        **** Starting Test! ****
        Total bytes read: 101360272384
        Time taken: 29310
                     total       used       free     shared    buffers     cached
        Mem:      71688220   13243604   58444616          0     146340    1865184
        -/+ buffers/cache:   11232080   60456140
        Swap:            0          0          0
        **** ending Test!**** 
        
        Show
        Vijay added a comment - Hi Pavel, Looks like howard malloc can work seamlessly with LD_PRELOAD but JEmalloc doesn't work well with LD_PRELOAD. Jason says: The crash in free() is the only one that tells me anything at all, and my only guesses are 1) mixed allocator usage or 2) application error, e.g. double free(). I really don't know anything about how the JVM is structured internally , how it interacts with malloc, how it uses/abuses dlopen(), etc., so I'm not going to be of much help without a lot more background information. In other hand the attached patch avoids the crashes. Comparision of the mallocs: [vijay_tcasstest@vijay_tcass-i-08e1f16c java]$ java -cp /apps/nfcassandra_server/lib/concurrentlinkedhashmap-lru-1.2.jar:/apps/nfcassandra_server/lib/jna-3.3.0.jar:/apps/nfcassandra_server/lib/apache-cassandra-1.1.0-beta2-SNAPSHOT.jar:. com.sun.jna.MallocAllocator 50000 2000000 total used free shared buffers cached Mem: 71688220 10569792 61118428 0 146360 1864972 -/+ buffers/cache: 8558460 63129760 Swap: 0 0 0 **** Starting Test! **** Total bytes read: 101423216640 Time taken: 28587 total used free shared buffers cached Mem: 71688220 15950408 55737812 0 146360 1865184 -/+ buffers/cache: 13938864 57749356 Swap: 0 0 0 **** ending Test!**** [vijay_tcasstest@vijay_tcass-i-08e1f16c java]$ export LD_LIBRARY_PATH=/home/vijay_tcasstest/howard/ [vijay_tcasstest@vijay_tcass-i-08e1f16c java]$ java -Djava.library.path=/home/vijay_tcasstest/howard/ -cp /apps/nfcassandra_server/lib/concurrentlinkedhashmap-lru-1.2.jar:/apps/nfcassandra_server/lib/jna-3.3.0.jar:/apps/nfcassandra_server/lib/apache-cassandra-1.1.0-beta2-SNAPSHOT.jar:. com.sun.jna.HowardMallocAllocator 50000 2000000 total used free shared buffers cached Mem: 71688220 10573476 61114744 0 146320 1864972 -/+ buffers/cache: 8562184 63126036 Swap: 0 0 0 **** Starting Test! **** Total bytes read: 101366196224 Time taken: 33959 total used free shared buffers cached Mem: 71688220 16292664 55395556 0 146320 1865184 -/+ buffers/cache: 14281160 57407060 Swap: 0 0 0 **** ending Test!**** [vijay_tcasstest@vijay_tcass-i-08e1f16c java]$ export LD_LIBRARY_PATH=/home/vijay_tcasstest/jemalloc/lib/ [vijay_tcasstest@vijay_tcass-i-08e1f16c java]$ java -Djava.library.path=/home/vijay_tcasstest/jemalloc/lib/ -cp /apps/nfcassandra_server/lib/concurrentlinkedhashmap-lru-1.2.jar:/apps/nfcassandra_server/lib/jna-3.3.0.jar:/apps/nfcassandra_server/lib/apache-cassandra-1.1.0-beta2-SNAPSHOT.jar:. com.sun.jna.JEMallocAllocator 50000 2000000 total used free shared buffers cached Mem: 71688220 10572896 61115324 0 146332 1864972 -/+ buffers/cache: 8561592 63126628 Swap: 0 0 0 **** Starting Test! **** Total bytes read: 101360272384 Time taken: 29310 total used free shared buffers cached Mem: 71688220 13243604 58444616 0 146340 1865184 -/+ buffers/cache: 11232080 60456140 Swap: 0 0 0 **** ending Test!****
        Hide
        Pavel Yaskevich added a comment - - edited

        Howard allocator uses even more memory (~300 MB more) than standard allocator but jemalloc buys as ~2.5 GB which is pretty good. The last thing here would be to investigate what causes free() segfaults with jemalloc so different memory allocators could be used without any structural changes to the code...

        Would be helpful if you could describe here the situation when that segfault happens.

        Show
        Pavel Yaskevich added a comment - - edited Howard allocator uses even more memory (~300 MB more) than standard allocator but jemalloc buys as ~2.5 GB which is pretty good. The last thing here would be to investigate what causes free() segfaults with jemalloc so different memory allocators could be used without any structural changes to the code... Would be helpful if you could describe here the situation when that segfault happens.
        Hide
        Vijay added a comment - - edited

        Segfaults happen in multiple places (opening a file, accessing malloc, while calling free, and in a lot of unrelated cases)...
        Unless we open JDK source code and figure out how it is structured it is hard to say when exactly it can fails (Let me know if you want to take a look at the hs_err*.log).

        In the bright side at least we can isolate this by calling via JNI, and we dont see the issue by loading JEMalloc via LD_LIBRARY_PATH. In v2 I removed the synchronization, i have also attached it here (Plz note the yaml setting is not included just to hide it for now). Thanks!
        Note: "jemalloc 2.2.5" release works fine and so as the git/dev branch.

        Show
        Vijay added a comment - - edited Segfaults happen in multiple places (opening a file, accessing malloc, while calling free, and in a lot of unrelated cases)... Unless we open JDK source code and figure out how it is structured it is hard to say when exactly it can fails (Let me know if you want to take a look at the hs_err*.log). In the bright side at least we can isolate this by calling via JNI, and we dont see the issue by loading JEMalloc via LD_LIBRARY_PATH. In v2 I removed the synchronization, i have also attached it here (Plz note the yaml setting is not included just to hide it for now). Thanks! Note: "jemalloc 2.2.5" release works fine and so as the git/dev branch.
        Hide
        paul cannon added a comment -

        This patch doesn't apply cleanly to trunk (anymore?). Rebase?

        Show
        paul cannon added a comment - This patch doesn't apply cleanly to trunk (anymore?). Rebase?
        Hide
        Vijay added a comment -

        Hi Paul, Sorry somehow missed update on the ticket... Rebased in v3

        FYI: To enable JEMalloc we need to update
        cassandra-env.sh with
        export LD_LIBRARY_PATH=~/jemalloc/lib/
        JVM_OPTS="-Djava.library.path=~/jemalloc/lib/"

        cassandra.yaml with
        memory_allocator: org.apache.cassandra.io.util.JEMallocAllocator

        Show
        Vijay added a comment - Hi Paul, Sorry somehow missed update on the ticket... Rebased in v3 FYI: To enable JEMalloc we need to update cassandra-env.sh with export LD_LIBRARY_PATH=~/jemalloc/lib/ JVM_OPTS="-Djava.library.path=~/jemalloc/lib/" cassandra.yaml with memory_allocator: org.apache.cassandra.io.util.JEMallocAllocator
        Hide
        Jonathan Ellis added a comment -

        So to summarize:

        • We don't need JNI
        • LD_PRELOAD makes things segfault but LD_LIBRARY_PATH works fine

        Right?

        It looks to me like we can make Allocator just return a long. Then the Memory heirarchy doesn't need to change at all.

        Messier but easy: just add a static Allocator to Memory.

        More refactoring but cleaner? move allocation outside of Memory and replace with constructor (long reference, long bytes); add allocate(long bytes) and allocateRefCounted(long bytes) factory to Allocator. RefCountedMemory would need to wrap Memory instead of subclassing.

        I also suggest adding a commented-out example to cassandra.yaml and cassandra-env.sh to illustrate how to enable this for those brave enough to try it. (This will go into 1.3 so plenty of time to test.)

        Show
        Jonathan Ellis added a comment - So to summarize: We don't need JNI LD_PRELOAD makes things segfault but LD_LIBRARY_PATH works fine Right? It looks to me like we can make Allocator just return a long. Then the Memory heirarchy doesn't need to change at all. Messier but easy: just add a static Allocator to Memory. More refactoring but cleaner? move allocation outside of Memory and replace with constructor (long reference, long bytes); add allocate(long bytes) and allocateRefCounted(long bytes) factory to Allocator. RefCountedMemory would need to wrap Memory instead of subclassing. I also suggest adding a commented-out example to cassandra.yaml and cassandra-env.sh to illustrate how to enable this for those brave enough to try it. (This will go into 1.3 so plenty of time to test.)
        Hide
        Vijay added a comment - - edited

        Hi Jonathan,

        We don't need JNI
        LD_PRELOAD makes things segfault but LD_LIBRARY_PATH works fine

        Right?

        We dont need any additional JNI, but we do use JNA to load the library .

        More refactoring but cleaner?

        the problem is that free should be called by the same allocator, hence attached patch doesnt have the refactor

        Rest Done, Thanks!

        Show
        Vijay added a comment - - edited Hi Jonathan, We don't need JNI LD_PRELOAD makes things segfault but LD_LIBRARY_PATH works fine Right? We dont need any additional JNI, but we do use JNA to load the library . More refactoring but cleaner? the problem is that free should be called by the same allocator, hence attached patch doesnt have the refactor Rest Done, Thanks!
        Hide
        Jonathan Ellis added a comment -

        Good point on free.

        Nits:

        • cassandra.yaml has comments but no actual memory_allocator option
        • should rename to IAllocator to follow convention
        • INSTANCE should not be capitalized

        Rest LGTM, ship it!

        Show
        Jonathan Ellis added a comment - Good point on free. Nits: cassandra.yaml has comments but no actual memory_allocator option should rename to IAllocator to follow convention INSTANCE should not be capitalized Rest LGTM, ship it!
        Hide
        Vijay added a comment -

        Committed with the Nit's fixed, Thanks!

        Show
        Vijay added a comment - Committed with the Nit's fixed, Thanks!
        Hide
        Vladimir Rodionov added a comment -

        A little bit late but better ever than never ...

        "JEMalloc but the only issue with it is that (both TCMalloc and JEMalloc) are kind of single threaded (at-least they crash in my test otherwise)."

        JEmalloc must be configured with :

        --disable-lazy-lock
            Disable code that wraps pthread_create() to detect when an application
            switches from single-threaded to multi-threaded mode, so that it can avoid
            mutex locking/unlocking operations while in single-threaded mode.  In
            practice, this feature usually has little impact on performance unless
            thread-specific caching is disabled.
        

        jemalloc, by deafult wraps pthread API and tries to detect when when an application
        switches from single-threaded to multi-threaded mode. This trick does not work inside JVM of course.

        Show
        Vladimir Rodionov added a comment - A little bit late but better ever than never ... "JEMalloc but the only issue with it is that (both TCMalloc and JEMalloc) are kind of single threaded (at-least they crash in my test otherwise)." JEmalloc must be configured with : --disable-lazy-lock Disable code that wraps pthread_create() to detect when an application switches from single-threaded to multi-threaded mode, so that it can avoid mutex locking/unlocking operations while in single-threaded mode. In practice, this feature usually has little impact on performance unless thread-specific caching is disabled. jemalloc, by deafult wraps pthread API and tries to detect when when an application switches from single-threaded to multi-threaded mode. This trick does not work inside JVM of course.
        Hide
        Jonathan Ellis added a comment -

        Thanks!

        Show
        Jonathan Ellis added a comment - Thanks!

          People

          • Assignee:
            Vijay
            Reporter:
            Vijay
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development