Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-1055

hbase-daemon executed by slider is excepted from nodemanager container monitoring

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Slider 0.81
    • Slider 1.0.0
    • application/hbase
    • None

    Description

      here is nodemanager log of a host where a HBASE_REGIONSERVER component is running

      2016-01-12 14:11:49,237 DEBUG monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(361)) - Current ProcessTree list : [ 9801 ]
      2016-01-12 14:11:49,237 DEBUG monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(436)) - Constructing ProcessTree for : PID = 9801 ContainerId = container_e07_1451897008090_0009_01_000003
      2016-01-12 14:11:49,262 DEBUG util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:updateProcessTree(274)) - [ 9801 9806 ]
      2016-01-12 14:11:49,262 INFO  monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 9801 for container-id container_e07_1451897008090_0009_01_000003: 14.2 MB of 1 GB physical memory used; 517.1 MB of 2.1 GB virtual memory used
      

      used memory for the container is lower than i expected.
      because pids ( 9801 9806 ) are slider-agent process. regionserver process was excepted from monitoring.

      here is the result of "ps axjf"

       9798  9801  9801  9801 ?           -1 Ss     500   0:00      \_ /bin/bash -c python ./infra/agent/slider-agent/agent/main.py --label container_e07_1451897008090_0009_01_000003___HBASE_REGIONSERVER --zk-quorum 
       9801  9806  9801  9801 ?           -1 Sl     500   0:01          \_ python ./infra/agent/slider-agent/agent/main.py --label container_e07_1451897008090_0009_01_000003___HBASE_REGIONSERVER --zk-quorum 
          1  9979  9801  9801 ?           -1 S      500   0:00 bash /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/bin/hbase-daemon.sh --config /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/conf foreground_start regionserver
       9979  9994  9801  9801 ?           -1 Sl     500   0:10  \_ /package/jdk-1.7.0_45/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/hs_err_pid%p.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/gc.log-201601121408 -Xmn200m -XX:CMSInitiatingOccupancyFraction=70 -Xms1024m -Xmx1024m -Dhbase.log.dir=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003 -Dhbase.log.file=hbase-yarn-regionserver.log -Dhbase.home.dir=/volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/bin/.. -Dhbase.id.str=yarn -Dhbase.root.logger=INFO,RFA -Djava.library.path=/package/hadoop-yarn-2.7.1-arch-centos6-x86_64/lib/native -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.regionserver.HRegionServer start
      

      when i use the ProcfsBasedProcessTree (default)
      process-tree is determined by relationship between parent and child process.
      so, daemonized process (ppid=1) can’t be included in process-tree.

      I don't know it can be fixed in slider.
      does it need to implement another ResourceCalculatorProcessTree to replace the ProcfsBasedProcessTree?

      Attachments

        1. SLIDER-1055.001.patch
          18 kB
          Tao Jie

        Activity

          People

            Unassigned Unassigned
            kyungwan nam kyungwan nam

            Dates

              Created:
              Updated:

              Slack

                Issue deployment