HBase
  1. HBase
  2. HBASE-5516

GZip leading to memory leak in 0.90. Fix similar to HBASE-5387 needed for 0.90.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.90.5
    • Fix Version/s: 0.90.7
    • Component/s: None
    • Labels:
      None

      Description

      Usage of GZip is leading to resident memory leak in 0.90.
      We need to have something similar to HBASE-5387 in 0.90.

      1. HBASE-5516_3_0.90.patch
        12 kB
        ramkrishna.s.vasudevan
      2. HBASE-5516_2_0.90.patch
        12 kB
        Laxman

        Activity

        Hide
        karthikp added a comment -

        ramkrishna.s.vasudevan Ted Yu
        I'd similar memory leak issue in regionserver process, VM and RSS memory continiously increasing 64MB.
        I'm using CDH4-hbase 0.94.6 , Above fix are avialable the same and MALLOC_ARENA_MAX=4 was set in hbase-env.sh
        I've huge write load and frequent minor compaction, We've used GZip hfile compression.
        Max java regionserver heap size is 32GB.

        top - 14:28:30 up 201 days, 21:06,  3 users,  load average: 5.67, 3.72, 3.31
        Tasks: 803 total,   1 running, 802 sleeping,   0 stopped,   0 zombie
        Cpu(s):  8.3%us,  2.1%sy,  0.0%ni, 85.9%id,  3.5%wa,  0.0%hi,  0.1%si,  0.0%st
        Mem:  65932340k total, 63961912k used,  1970428k free,  2394528k buffers
        Swap: 29659132k total,    63532k used, 29595600k free,  1095268k cached
        
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
         57335 hbase     20   0 46.4g  44g 9296 S 98.2 70.9  13319:10 java
        
         [hbase@pb1hsl5-hslave ~]$  pmap -x 57335 | sort -k 3 -nr | more
        total kB        48695984 46765792 46756512
        00007ff312460000 33171072 33169464 33169464 rwx--    [ anon ]
        000000004010a000 1448552 1448552 1448552 rwx--    [ anon ]
        00007ff2d1810000  612120  603124  603124 rwx--    [ anon ]
        00007ff2fadff000  383364  383364  383364 rwx--    [ anon ]
        00007ff0e8000000  131072  131072  131072 rwx--    [ anon ]
        00007ff218000000  131048  131048  131048 rwx--    [ anon ]
        00007ff128000000  131068  131048  131048 rwx--    [ anon ]
        00007ff230000000  131044  131044  131044 rwx--    [ anon ]
        00007ff000000000  131036  131036  131036 rwx--    [ anon ]
        00007fefe0000000  131060  131036  131036 rwx--    [ anon ]
        00007ff23c000000   65536   65536   65536 rwx--    [ anon ]
        00007ff0a4000000   65536   65536   65536 rwx--    [ anon ]
        00007ff054000000   65536   65536   65536 rwx--    [ anon ]
        00007ff01c000000   65536   65536   65536 rwx--    [ anon ]
        00007fefb4000000   65536   65536   65536 rwx--    [ anon ]
        00007ff22c000000   65532   65532   65532 rwx--    [ anon ]
        00007ff110000000   65532   65532   65532 rwx--    [ anon ]
        00007ff10c000000   65532   65532   65532 rwx--    [ anon ]
        00007ff0b8000000   65532   65532   65532 rwx--    [ anon ]
        00007ff09c000000   65532   65532   65532 rwx--    [ anon ]
        00007feff8000000   65532   65532   65532 rwx--    [ anon ]
        00007ff250000000   65528   65528   65528 rwx--    [ anon ]
        --More--
        
        $pmap -x 57335 | awk '{print $3}' | awk '{ if($i<65536 && $i>64000) print $i}' | wc -l
        146
        

        Will MALLOC_AREANA create exactly 65536KB?
        Regionserver process has many anon pages, size varying from 64000 to 65536 KB that shown above.

        Appreciate any help you can provide!!

        Show
        karthikp added a comment - ramkrishna.s.vasudevan Ted Yu I'd similar memory leak issue in regionserver process, VM and RSS memory continiously increasing 64MB. I'm using CDH4-hbase 0.94.6 , Above fix are avialable the same and MALLOC_ARENA_MAX=4 was set in hbase-env.sh I've huge write load and frequent minor compaction, We've used GZip hfile compression. Max java regionserver heap size is 32GB. top - 14:28:30 up 201 days, 21:06, 3 users, load average: 5.67, 3.72, 3.31 Tasks: 803 total, 1 running, 802 sleeping, 0 stopped, 0 zombie Cpu(s): 8.3%us, 2.1%sy, 0.0%ni, 85.9%id, 3.5%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 65932340k total, 63961912k used, 1970428k free, 2394528k buffers Swap: 29659132k total, 63532k used, 29595600k free, 1095268k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 57335 hbase 20 0 46.4g 44g 9296 S 98.2 70.9 13319:10 java [hbase@pb1hsl5-hslave ~]$ pmap -x 57335 | sort -k 3 -nr | more total kB 48695984 46765792 46756512 00007ff312460000 33171072 33169464 33169464 rwx-- [ anon ] 000000004010a000 1448552 1448552 1448552 rwx-- [ anon ] 00007ff2d1810000 612120 603124 603124 rwx-- [ anon ] 00007ff2fadff000 383364 383364 383364 rwx-- [ anon ] 00007ff0e8000000 131072 131072 131072 rwx-- [ anon ] 00007ff218000000 131048 131048 131048 rwx-- [ anon ] 00007ff128000000 131068 131048 131048 rwx-- [ anon ] 00007ff230000000 131044 131044 131044 rwx-- [ anon ] 00007ff000000000 131036 131036 131036 rwx-- [ anon ] 00007fefe0000000 131060 131036 131036 rwx-- [ anon ] 00007ff23c000000 65536 65536 65536 rwx-- [ anon ] 00007ff0a4000000 65536 65536 65536 rwx-- [ anon ] 00007ff054000000 65536 65536 65536 rwx-- [ anon ] 00007ff01c000000 65536 65536 65536 rwx-- [ anon ] 00007fefb4000000 65536 65536 65536 rwx-- [ anon ] 00007ff22c000000 65532 65532 65532 rwx-- [ anon ] 00007ff110000000 65532 65532 65532 rwx-- [ anon ] 00007ff10c000000 65532 65532 65532 rwx-- [ anon ] 00007ff0b8000000 65532 65532 65532 rwx-- [ anon ] 00007ff09c000000 65532 65532 65532 rwx-- [ anon ] 00007feff8000000 65532 65532 65532 rwx-- [ anon ] 00007ff250000000 65528 65528 65528 rwx-- [ anon ] --More-- $pmap -x 57335 | awk '{print $3}' | awk '{ if($i<65536 && $i>64000) print $i}' | wc -l 146 Will MALLOC_AREANA create exactly 65536KB? Regionserver process has many anon pages, size varying from 64000 to 65536 KB that shown above. Appreciate any help you can provide!!
        Hide
        ramkrishna.s.vasudevan added a comment -

        0.90.x versions are obsolete. Resolving as 'won't fix'.

        Show
        ramkrishna.s.vasudevan added a comment - 0.90.x versions are obsolete. Resolving as 'won't fix'.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Jon
        Currently am not working on 0.90. So i may not find time on that. But i would say that you can take a look at the patch? Actually in our 0.90 cluster while using GZIp compression we found memory leak frequently and that occured due to GZip Streams.
        Thanks Jon.

        Show
        ramkrishna.s.vasudevan added a comment - @Jon Currently am not working on 0.90. So i may not find time on that. But i would say that you can take a look at the patch? Actually in our 0.90 cluster while using GZIp compression we found memory leak frequently and that occured due to GZip Streams. Thanks Jon.
        Hide
        Jonathan Hsieh added a comment -

        Hm.. no tests, going to bump to 0.90.8 unless action taken.

        Show
        Jonathan Hsieh added a comment - Hm.. no tests, going to bump to 0.90.8 unless action taken.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Pls review this patch. Will commit it tomorrow if it is fine.

        Show
        ramkrishna.s.vasudevan added a comment - Pls review this patch. Will commit it tomorrow if it is fine.
        Hide
        Ted Yu added a comment -

        +1 on patch v3.

        Show
        Ted Yu added a comment - +1 on patch v3.
        Hide
        ramkrishna.s.vasudevan added a comment - - edited

        What if blockBegin is > 0 but less than HEADER_SIZE ?

        This will not happen Ted. The new block will be created only after one block is completed. So the blcok begin should be > 0 and not less than header size. Correct me if am wrong.

        Show
        ramkrishna.s.vasudevan added a comment - - edited What if blockBegin is > 0 but less than HEADER_SIZE ? This will not happen Ted. The new block will be created only after one block is completed. So the blcok begin should be > 0 and not less than header size. Correct me if am wrong.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Updated patch addressing comments.

        Show
        ramkrishna.s.vasudevan added a comment - Updated patch addressing comments.
        Hide
        ramkrishna.s.vasudevan added a comment - - edited
        top - 08:41:24 up 28 days, 17:46,  4 users,  load average: 3.23, 2.73, 2.52
        Tasks: 308 total,   1 running, 307 sleeping,   0 stopped,   0 zombie
        Cpu(s): 10.3%us,  2.4%sy,  0.0%ni, 78.8%id,  7.8%wa,  0.0%hi,  0.7%si,  0.0%st
        Mem:     48264M total,    48125M used,      139M free,     4370M buffers
        Swap:    51199M total,       53M used,    51146M free,    20926M cached
        
          PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                              
        30934 root      20   0 23.2g  20g  12m S  144 42.6 314:25.74 java                                                                                                                  
        30717 root      20   0 4859m 200m  10m S   50  0.4  66:09.54 java                                                                                                                  
         7351 root      20   0     0    0    0 D    2  0.0  14:30.49 kjournald                                                                                                             
           38 root      20   0     0    0    0 S    1  0.0   0:36.31 events/3                                                                                                              
          127 root      20   0     0    0    0 S    1  0.0  48:30.54 kswapd0                                                                                                               
          128 root      20   0     0    0    0 S    1  0.0  47:48.75 kswapd1                                                                                                               
         4877 root      20   0  8968  400  260 S    1  0.0   8:48.69 irqbalance                                                                                                            
        12644 root      20   0  8900 1356  852 R    1  0.0   0:00.91 top                  
        
        ====================================================
        
        

        The above reports are before patch. You can see the native memory going beyond 20g.

        top - 13:41:13 up 28 days, 22:46,  4 users,  load average: 0.00, 0.02, 0.00
        Tasks: 298 total,   1 running, 297 sleeping,   0 stopped,   0 zombie
        Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
        Mem:     48264M total,    27960M used,    20304M free,     4861M buffers
        Swap:    51199M total,       53M used,    51146M free,    12556M cached
        
          PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                              
         6822 root      20   0  382m 7916 4116 S    0  0.0  31:22.40 nautilus                                                                                                              
        13197 root      20   0 4828m 189m  10m S    0  0.4  60:49.90 java                                                                                                                  
        13424 root      20   0 8901m 8.1g  12m S    0 17.2 407:56.80 java                                                                                                                  
        27929 root      20   0  3620 1072  600 S    0  0.0   0:05.99 nmon                                                                                                                  
        28225 root      20   0  8900 1344  856 R    0  0.0   0:00.04 top                                                                                                                   
            1 root      20   0 10376  340  304 S    0  0.0   0:13.82 init                                                                                                                  
            2 root      20   0     0    0    0 S    0  0.0   0:00.25 kthreadd                                                                                                              
            3 root      RT   0     0    0    0 S    0  0.0   0:01.22 migration/0                                                                                                           
            4 root      20   0     0    0    0 S    0  0.0   2:45.12 ksoftirqd/0                                                                                                           
            5 root      RT   0     0    0    0 S    0  0.0   0:00.66 migration/1                                                                                                           
            6 root      20   0     0    0    0 S    0  0.0   0:22.16 ksoftirqd/1 
        

        In the above the process with 8.1 g is the RS process and we can see there is no increase.

        The test was done over a period of 3 hours with write load.

        Show
        ramkrishna.s.vasudevan added a comment - - edited top - 08:41:24 up 28 days, 17:46, 4 users, load average: 3.23, 2.73, 2.52 Tasks: 308 total, 1 running, 307 sleeping, 0 stopped, 0 zombie Cpu(s): 10.3%us, 2.4%sy, 0.0%ni, 78.8%id, 7.8%wa, 0.0%hi, 0.7%si, 0.0%st Mem: 48264M total, 48125M used, 139M free, 4370M buffers Swap: 51199M total, 53M used, 51146M free, 20926M cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30934 root 20 0 23.2g 20g 12m S 144 42.6 314:25.74 java 30717 root 20 0 4859m 200m 10m S 50 0.4 66:09.54 java 7351 root 20 0 0 0 0 D 2 0.0 14:30.49 kjournald 38 root 20 0 0 0 0 S 1 0.0 0:36.31 events/3 127 root 20 0 0 0 0 S 1 0.0 48:30.54 kswapd0 128 root 20 0 0 0 0 S 1 0.0 47:48.75 kswapd1 4877 root 20 0 8968 400 260 S 1 0.0 8:48.69 irqbalance 12644 root 20 0 8900 1356 852 R 1 0.0 0:00.91 top ==================================================== The above reports are before patch. You can see the native memory going beyond 20g. top - 13:41:13 up 28 days, 22:46, 4 users, load average: 0.00, 0.02, 0.00 Tasks: 298 total, 1 running, 297 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 48264M total, 27960M used, 20304M free, 4861M buffers Swap: 51199M total, 53M used, 51146M free, 12556M cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6822 root 20 0 382m 7916 4116 S 0 0.0 31:22.40 nautilus 13197 root 20 0 4828m 189m 10m S 0 0.4 60:49.90 java 13424 root 20 0 8901m 8.1g 12m S 0 17.2 407:56.80 java 27929 root 20 0 3620 1072 600 S 0 0.0 0:05.99 nmon 28225 root 20 0 8900 1344 856 R 0 0.0 0:00.04 top 1 root 20 0 10376 340 304 S 0 0.0 0:13.82 init 2 root 20 0 0 0 0 S 0 0.0 0:00.25 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:01.22 migration/0 4 root 20 0 0 0 0 S 0 0.0 2:45.12 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.66 migration/1 6 root 20 0 0 0 0 S 0 0.0 0:22.16 ksoftirqd/1 In the above the process with 8.1 g is the RS process and we can see there is no increase. The test was done over a period of 3 hours with write load.
        Hide
        ramkrishna.s.vasudevan added a comment -

        We are facing the same problem in the 0.90 also when using GZIP. Below is a snapshot of the 'top' command

        13236 root      20   0 21.9g  21g  12m S    1 68.4 450:56.37 /opt/nn/jdk1.6.0_22
        13236 root      20   0 21.9g  21g  12m S    1 68.4 450:56.71 /opt/nn/jdk1.6.0_22
        13236 root      20   0 21.9g  21g  12m S    1 68.4 450:57.06 /opt/nn/jdk1.6.0_22
        

        This value we got before the patch was created. After the patch no leak was found. Anyway precise reports will update tomorrow as today our test clusters are busy with other task.
        Configured heap is 4G for RS.

        Show
        ramkrishna.s.vasudevan added a comment - We are facing the same problem in the 0.90 also when using GZIP. Below is a snapshot of the 'top' command 13236 root 20 0 21.9g 21g 12m S 1 68.4 450:56.37 /opt/nn/jdk1.6.0_22 13236 root 20 0 21.9g 21g 12m S 1 68.4 450:56.71 /opt/nn/jdk1.6.0_22 13236 root 20 0 21.9g 21g 12m S 1 68.4 450:57.06 /opt/nn/jdk1.6.0_22 This value we got before the patch was created. After the patch no leak was found. Anyway precise reports will update tomorrow as today our test clusters are busy with other task. Configured heap is 4G for RS.
        Hide
        Ted Yu added a comment -

        Can test results be described here ?

        +      if (this.compressAlgo.equals(Compression.Algorithm.GZ) && blockBegin > 0) {
        +        blockBegin -= HEADER_SIZE;
        +      }
        

        What if blockBegin is > 0 but less than HEADER_SIZE ?

        +      if (compressionBos == null) {
        +        if (this.compressAlgo.equals(Compression.Algorithm.GZ)) {
        +          createCompressionStream();
        +        }
        +      }
        

        The nested if statements can be condensed into one if statement.

        Show
        Ted Yu added a comment - Can test results be described here ? + if ( this .compressAlgo.equals(Compression.Algorithm.GZ) && blockBegin > 0) { + blockBegin -= HEADER_SIZE; + } What if blockBegin is > 0 but less than HEADER_SIZE ? + if (compressionBos == null ) { + if ( this .compressAlgo.equals(Compression.Algorithm.GZ)) { + createCompressionStream(); + } + } The nested if statements can be condensed into one if statement.
        Hide
        Laxman added a comment -

        Please review the patch and share your comments.

        Show
        Laxman added a comment - Please review the patch and share your comments.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Test cases are running. Will upload the patch after that.

        Show
        ramkrishna.s.vasudevan added a comment - Test cases are running. Will upload the patch after that.
        Hide
        ramkrishna.s.vasudevan added a comment -

        We have an internal patch. But it requires some more formatting while writing the header. Will upload once done and tested.

        Show
        ramkrishna.s.vasudevan added a comment - We have an internal patch. But it requires some more formatting while writing the header. Will upload once done and tested.

          People

          • Assignee:
            ramkrishna.s.vasudevan
            Reporter:
            ramkrishna.s.vasudevan
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development