Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10396

Overall memory of container keep on growing due to kafka stream / rocksdb and OOM killed once limit reached

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.3.1, 2.5.0
    • None
    • streams
    • None

    Description

      We are observing that overall memory of our container keep on growing and never came down.
      After analysis find out that rocksdbjni.so is keep on allocating 64M chunks of memory off-heap and never releases back. This causes OOM kill after memory reaches configured limit.

      We use Kafka stream and globalktable for our many kafka topics.

      Below is our environment

      • Kubernetes cluster
      • openjdk 11.0.7 2020-04-14 LTS
      • OpenJDK Runtime Environment Zulu11.39+16-SA (build 11.0.7+10-LTS)
      • OpenJDK 64-Bit Server VM Zulu11.39+16-SA (build 11.0.7+10-LTS, mixed mode)
      • Springboot 2.3
      • spring-kafka-2.5.0
      • kafka-streams-2.5.0
      • kafka-streams-avro-serde-5.4.0
      • rocksdbjni-5.18.3

      Observed same result with kafka 2.3 version.

      Below is the snippet of our analysis
      from pmap output we took addresses from these 64M allocations (RSS)

      Address Kbytes RSS Dirty Mode Mapping
      00007f3ce8000000 65536 65532 65532 rw--- [ anon ]
      00007f3cf4000000 65536 65536 65536 rw--- [ anon ]
      00007f3d64000000 65536 65536 65536 rw--- [ anon ]

      We tried to match with memory allocation logs enabled with the help of Azul systems team.

      @ /tmp/librocksdbjni6564497922441568920.so:
      _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] - 0x7f3ce8ff7ca0
      @ /tmp/librocksdbjni6564497922441568920.so:
      _ZN7rocksdb15BlockBasedTable3GetERKNS_11ReadOptionsERKNS_5SliceEPNS_10GetContextEPKNS_14SliceTransformEb+0x894)[0x7f3e1c898fd4] - 0x7f3ce8ff9780
      @ /tmp/librocksdbjni6564497922441568920.so:
      _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] - 0x7f3ce8ff9750
      @ /tmp/librocksdbjni6564497922441568920.so:
      _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] - 0x7f3ce8ff97c0
      @ /tmp/librocksdbjni6564497922441568920.so:_Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] - 0x7f3ce8ffccf0
      @ /tmp/librocksdbjni6564497922441568920.so:
      _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] - 0x7f3ce8ffcd10
      @ /tmp/librocksdbjni6564497922441568920.so:
      _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0xfa)[0x7f3e1c65d5da] - 0x7f3ce8ffccf0
      @ /tmp/librocksdbjni6564497922441568920.so:
      _Z18rocksdb_get_helperP7JNIEnv_PN7rocksdb2DBERKNS1_11ReadOptionsEPNS1_18ColumnFamilyHandleEP11_jbyteArrayii+0x261)[0x7f3e1c65d741] - 0x7f3ce8ffcd10

      We also identified that content on this 64M is just 0s and no any data present in it.

      I tried to tune the rocksDB configuratino as mentioned but it did not helped. https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-rocksdb-config

       

      Please let me know if you need any more details

      Attachments

        1. MyStreamProcessor.java
          4 kB
          Vagesh Mathapati
        2. kafkaStreamConfig.java
          3 kB
          Vagesh Mathapati
        3. CustomRocksDBConfig.java
          3 kB
          Vagesh Mathapati

        Activity

          People

            rohanpd Rohan Desai
            vmathapati Vagesh Mathapati
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: