Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2836

Maybe wrong memory size used to detect pressure

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.10.0
    • 1.11.0
    • tserver
    • None

    Description

      One of my tserver, totally 128G memory, gflags: 

      -memory_limit_hard_bytes=107374182475 (100G)  -memory_limit_soft_percentage=85 -memory_pressure_percentage=80

      Memory used about 95%, "top" result like:

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
      8359 work 20 0 0.326t 0.116t 81780 S 727.9 94.6 230228:10 kudu_tablet_ser
      

      That is kudu_tablet_server process used about 116G memory.

      On mem-trackers page, I find the "Total consumption" value is about 65G, much lower than 116G.

      Then, I login to the server and read code to check any free memory MM operations are work correctly. Unfortunatly, the memory pressure detect function(process_memory::UnderMemoryPressure) doesn't report it's under pressure, because the tcmalloc function GetNumericProperty(const char* property, size_t* value) with parameter "generic.current_allocated_bytes" doesn't return the memory as the memory use reported by the OS.

      https://gperftools.github.io/gperftools/tcmalloc.html

      generic.current_allocated_bytes Number of bytes used by the application. This will not typically match the memory use reported by the OS, because it does not include TCMalloc overhead or memory fragmentation.

      This situation may lead to OPs prefer to free memory could not be scheduled promptly, and the OS memory may consumed empty, and then kill tserver because of OOM.

      Attachments

        1. 选区_313.jpg
          69 kB
          Yingchun Lai

        Activity

          People

            acelyc111 Yingchun Lai
            acelyc111 Yingchun Lai
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: