Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0
-
None
-
This is the yarn-site.xml for 3.0.
<configuration>
<property>
<name>hadoop.registry.dns.bind-port</name>
<value>5353</value>
</property><property>
<name>hadoop.registry.dns.domain-name</name>
<value>hwx.site</value>
</property><property>
<name>hadoop.registry.dns.enabled</name>
<value>true</value>
</property><property>
<name>hadoop.registry.dns.zone-mask</name>
<value>255.255.255.0</value>
</property><property>
<name>hadoop.registry.dns.zone-subnet</name>
<value>172.17.0.0</value>
</property><property>
<name>manage.include.files</name>
<value>false</value>
</property><property>
<name>yarn.acl.enable</name>
<value>false</value>
</property><property>
<name>yarn.admin.acl</name>
<value>yarn</value>
</property><property>
<name>yarn.client.nodemanager-connect.max-wait-ms</name>
<value>60000</value>
</property><property>
<name>yarn.client.nodemanager-connect.retry-interval-ms</name>
<value>10000</value>
</property><property>
<name>yarn.http.policy</name>
<value>HTTP_ONLY</value>
</property><property>
<name>yarn.log-aggregation-enable</name>
<value>false</value>
</property><property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>2592000</value>
</property><property>
<name>yarn.log.server.url</name>
<value>http://xxxxxx:19888/jobhistory/logs</value>
</property><property>
<name>yarn.log.server.web-service.url</name>
<value>http://xxxxxx:8188/ws/v1/applicationhistory</value>
</property><property>
<name>yarn.node-labels.enabled</name>
<value>false</value>
</property><property>
<name>yarn.node-labels.fs-store.retry-policy-spec</name>
<value>2000, 500</value>
</property><property>
<name>yarn.node-labels.fs-store.root-dir</name>
<value>/system/yarn/node-labels</value>
</property><property>
<name>yarn.nodemanager.address</name>
<value>0.0.0.0:45454</value>
</property><property>
<name>yarn.nodemanager.admin-env</name>
<value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
</property><property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark2_shuffle,timeline_collector</value>
</property><property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property><property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property><property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
<value>/usr/spark2/aux/*</value>
</property><property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property><property>
<name>yarn.nodemanager.aux-services.timeline_collector.class</name>
<value>org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService</value>
</property><property>
<name>yarn.nodemanager.bind-host</name>
<value>0.0.0.0</value>
</property><property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property><property>
<name>yarn.nodemanager.container-metrics.unregister-delay-ms</name>
<value>60000</value>
</property><property>
<name>yarn.nodemanager.container-monitor.interval-ms</name>
<value>3000</value>
</property><property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>0</value>
</property><property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>90</value>
</property><property>
<name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name>
<value>1000</value>
</property><property>
<name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
<value>0.25</value>
</property><property>
<name>yarn.nodemanager.health-checker.interval-ms</name>
<value>135000</value>
</property><property>
<name>yarn.nodemanager.health-checker.script.timeout-ms</name>
<value>60000</value>
</property><property>
<name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name>
<value>false</value>
</property><property>
<name>yarn.nodemanager.linux-container-executor.group</name>
<value>hadoop</value>
</property><property>
<name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name>
<value>false</value>
</property><property>
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop/yarn/local</value>
</property><property>
<name>yarn.nodemanager.log-aggregation.compression-type</name>
<value>gz</value>
</property><property>
<name>yarn.nodemanager.log-aggregation.debug-enabled</name>
<value>false</value>
</property><property>
<name>yarn.nodemanager.log-aggregation.num-log-files-per-app</name>
<value>30</value>
</property><property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>3600</value>
</property><property>
<name>yarn.nodemanager.log-dirs</name>
<value>/hadoop/yarn/log</value>
</property><property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>604800</value>
</property><property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property><property>
<name>yarn.nodemanager.recovery.dir</name>
<value>/var/log/hadoop-yarn/nodemanager/recovery-state</value>
</property><property>
<name>yarn.nodemanager.recovery.enabled</name>
<value>true</value>
</property><property>
<name>yarn.nodemanager.recovery.supervised</name>
<value>true</value>
</property><property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property><property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property><property>
<name>yarn.nodemanager.resource-plugins</name>
<value></value>
</property><property>
<name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
<value>auto</value>
</property><property>
<name>yarn.nodemanager.resource-plugins.gpu.docker-plugin</name>
<value>nvidia-docker-v1</value>
</property><property>
<name>yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidiadocker-
v1.endpoint</name>
<value>http://localhost:3476/v1.0/docker/cli</value>
</property><property>
<name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
<value></value>
</property><property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>6</value>
</property><property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>12288</value>
</property><property>
<name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name>
<value>80</value>
</property><property>
<name>yarn.nodemanager.runtime.linux.allowed-runtimes</name>
<value>default,docker</value>
</property><property>
<name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name>
<value>host,none,bridge</value>
</property><property>
<name>yarn.nodemanager.runtime.linux.docker.capabilities</name>
<value>
CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,
SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE</value>
</property><property>
<name>yarn.nodemanager.runtime.linux.docker.default-container-network</name>
<value>host</value>
</property><property>
<name>yarn.nodemanager.runtime.linux.docker.privileged-containers.acl</name>
<value></value>
</property><property>
<name>yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed</name>
<value>false</value>
</property><property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property><property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property><property>
<name>yarn.nodemanager.webapp.cross-origin.enabled</name>
<value>true</value>
</property><property>
<name>yarn.resourcemanager.address</name>
<value>xxxxxxx:8050</value>
</property><property>
<name>yarn.resourcemanager.admin.address</name>
<value>xxxxxx:8141</value>
</property><property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>2</value>
</property><property>
<name>yarn.resourcemanager.bind-host</name>
<value>0.0.0.0</value>
</property><property>
<name>yarn.resourcemanager.connect.max-wait.ms</name>
<value>900000</value>
</property><property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>30000</value>
</property><property>
<name>yarn.resourcemanager.fs.state-store.retry-policy-spec</name>
<value>2000, 500</value>
</property><property>
<name>yarn.resourcemanager.fs.state-store.uri</name>
<value> </value>
</property><property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>false</value>
</property><property>
<name>yarn.resourcemanager.hostname</name>
<value>xxxxxxxx</value>
</property><property>
<name>yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval</name>
<value>15000</value>
</property><property>
<name>yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor</name>
<value>1</value>
</property><property>
<name>yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round</name>
<value>0.25</value>
</property><property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/etc/hadoop/conf/yarn.exclude</value>
</property><property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property><property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>xxxxxxx:8025</value>
</property><property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>xxxxxxxx:8030</value>
</property><property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property><property>
<name>yarn.resourcemanager.scheduler.monitor.enable</name>
<value>false</value>
</property><property>
<name>yarn.resourcemanager.state-store.max-completed-applications</name>
<value>${yarn.resourcemanager.max-completed-applications}</value>
</property><property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property><property>
<name>yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size</name>
<value>10</value>
</property><property>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property><property>
<name>yarn.resourcemanager.webapp.address</name>
<value>xxxxxx:8088</value>
</property><property>
<name>yarn.resourcemanager.webapp.cross-origin.enabled</name>
<value>true</value>
</property><property>
<name>yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled</name>
<value>false</value>
</property><property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>wxxxxxx:8090</value>
</property><property>
<name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
<value>true</value>
</property><property>
<name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name>
<value>10000</value>
</property><property>
<name>yarn.resourcemanager.zk-acl</name>
<value>world:anyone:rwcda</value>
</property><property>
<name>yarn.resourcemanager.zk-address</name>
<value>xxxxxx:2181,xxxxxx:2181,xxxxxx:2181</value>
</property><property>
<name>yarn.resourcemanager.zk-num-retries</name>
<value>1000</value>
</property><property>
<name>yarn.resourcemanager.zk-retry-interval-ms</name>
<value>1000</value>
</property><property>
<name>yarn.resourcemanager.zk-state-store.parent-path</name>
<value>/rmstore</value>
</property><property>
<name>yarn.resourcemanager.zk-timeout-ms</name>
<value>10000</value>
</property><property>
<name>yarn.rm.system-metricspublisher.emit-container-events</name>
<value>true</value>
</property><property>
<name>yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled</name>
<value>false</value>
</property><property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>12288</value>
</property><property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>6</value>
</property><property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>64</value>
</property><property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property><property>
<name>yarn.service.framework.path</name>
<value>/yarn/service-dep.tar.gz</value>
</property><property>
<name>yarn.system-metricspublisher.enabled</name>
<value>true</value>
</property><property>
<name>yarn.timeline-service.address</name>
<value>xxxxxx:10200</value>
</property><property>
<name>yarn.timeline-service.bind-host</name>
<value>0.0.0.0</value>
</property><property>
<name>yarn.timeline-service.client.max-retries</name>
<value>30</value>
</property><property>
<name>yarn.timeline-service.client.retry-interval-ms</name>
<value>1000</value>
</property><property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property><property>
<name>yarn.timeline-service.entity-group-fs-store.active-dir</name>
<value>/ats/active/</value>
</property><property>
<name>yarn.timeline-service.entity-group-fs-store.app-cache-size</name>
<value>10</value>
</property><property>
<name>yarn.timeline-service.entity-group-fs-store.cleaner-interval-seconds</name>
<value>3600</value>
</property><property>
<name>yarn.timeline-service.entity-group-fs-store.done-dir</name>
<value>/ats/done/</value>
</property><property>
<name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes</name>
<value></value>
</property><property>
<name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
<value></value>
</property><property>
<name>yarn.timeline-service.entity-group-fs-store.retain-seconds</name>
<value>604800</value>
</property><property>
<name>yarn.timeline-service.entity-group-fs-store.scan-interval-seconds</name>
<value>60</value>
</property><property>
<name>yarn.timeline-service.entity-group-fs-store.summary-store</name>
<value>org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore</value>
</property><property>
<name>yarn.timeline-service.generic-application-history.store-class</name>
<value>org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore</value>
</property><property>
<name>yarn.timeline-service.hbase-schema.prefix</name>
<value>prod.</value>
</property><property>
<name>yarn.timeline-service.hbase.configuration.file</name>
<value>file:///etc/yarn-hbase/conf/hbase-site.xml</value>
</property><property>
<name>yarn.timeline-service.hbase.coprocessor.jar.hdfs.location</name>
<value>file:///hadoop-yarn-client/timelineservice/hadoop-yarn-server-timelineservice-hbase-coprocessor.jar</value>
</property><property>
<name>yarn.timeline-service.http-authentication.simple.anonymous.allowed</name>
<value>true</value>
</property><property>
<name>yarn.timeline-service.http-authentication.type</name>
<value>simple</value>
</property><property>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property><property>
<name>yarn.timeline-service.leveldb-state-store.path</name>
<value>/hadoop/yarn/timeline</value>
</property><property>
<name>yarn.timeline-service.leveldb-timeline-store.path</name>
<value>/hadoop/yarn/timeline</value>
</property><property>
<name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name>
<value>104857600</value>
</property><property>
<name>yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size</name>
<value>10000</value>
</property><property>
<name>yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size</name>
<value>10000</value>
</property><property>
<name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name>
<value>300000</value>
</property><property>
<name>yarn.timeline-service.reader.webapp.address</name>
<value>xxxxxx:8198</value>
</property><property>
<name>yarn.timeline-service.reader.webapp.https.address</name>
<value>xxxxxx:8199</value>
</property><property>
<name>yarn.timeline-service.recovery.enabled</name>
<value>true</value>
</property><property>
<name>yarn.timeline-service.state-store-class</name>
<value>org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore</value>
</property><property>
<name>yarn.timeline-service.store-class</name>
<value>org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore</value>
</property><property>
<name>yarn.timeline-service.ttl-enable</name>
<value>true</value>
</property><property>
<name>yarn.timeline-service.ttl-ms</name>
<value>2678400000</value>
</property><property>
<name>yarn.timeline-service.version</name>
<value>2.0</value>
</property><property>
<name>yarn.timeline-service.versions</name>
<value>1.5f,2.0f</value>
</property><property>
<name>yarn.timeline-service.webapp.address</name>
<value>xxxxxx:8188</value>
</property><property>
<name>yarn.timeline-service.webapp.https.address</name>
<value>xxxxxx:8190</value>
</property><property>
<name>yarn.webapp.api-service.enable</name>
<value>true</value>
</property><property>
<name>yarn.webapp.ui2.enable</name>
<value>true</value>
</property></configuration>
This is the yarn-site.xml for 3.0. <configuration> <property> <name>hadoop.registry.dns.bind-port</name> <value>5353</value> </property> <property> <name>hadoop.registry.dns.domain-name</name> <value>hwx.site</value> </property> <property> <name>hadoop.registry.dns.enabled</name> <value>true</value> </property> <property> <name>hadoop.registry.dns.zone-mask</name> <value>255.255.255.0</value> </property> <property> <name>hadoop.registry.dns.zone-subnet</name> <value>172.17.0.0</value> </property> <property> <name>manage.include.files</name> <value>false</value> </property> <property> <name>yarn.acl.enable</name> <value>false</value> </property> <property> <name>yarn.admin.acl</name> <value>yarn</value> </property> <property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>60000</value> </property> <property> <name>yarn.client.nodemanager-connect.retry-interval-ms</name> <value>10000</value> </property> <property> <name>yarn.http.policy</name> <value>HTTP_ONLY</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>false</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>2592000</value> </property> <property> <name>yarn.log.server.url</name> <value> http://xxxxxx:19888/jobhistory/logs </value> </property> <property> <name>yarn.log.server.web-service.url</name> <value> http://xxxxxx:8188/ws/v1/applicationhistory </value> </property> <property> <name>yarn.node-labels.enabled</name> <value>false</value> </property> <property> <name>yarn.node-labels.fs-store.retry-policy-spec</name> <value>2000, 500</value> </property> <property> <name>yarn.node-labels.fs-store.root-dir</name> <value>/system/yarn/node-labels</value> </property> <property> <name>yarn.nodemanager.address</name> <value>0.0.0.0:45454</value> </property> <property> <name>yarn.nodemanager.admin-env</name> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,spark2_shuffle,timeline_collector</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.aux-services.spark2_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property> <property> <name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name> <value>/usr/spark2/aux/*</value> </property> <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property> <property> <name>yarn.nodemanager.aux-services.timeline_collector.class</name> <value>org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService</value> </property> <property> <name>yarn.nodemanager.bind-host</name> <value>0.0.0.0</value> </property> <property> <name>yarn.nodemanager.container-executor.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value> </property> <property> <name>yarn.nodemanager.container-metrics.unregister-delay-ms</name> <value>60000</value> </property> <property> <name>yarn.nodemanager.container-monitor.interval-ms</name> <value>3000</value> </property> <property> <name>yarn.nodemanager.delete.debug-delay-sec</name> <value>0</value> </property> <property> <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name> <value>90</value> </property> <property> <name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name> <value>1000</value> </property> <property> <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name> <value>0.25</value> </property> <property> <name>yarn.nodemanager.health-checker.interval-ms</name> <value>135000</value> </property> <property> <name>yarn.nodemanager.health-checker.script.timeout-ms</name> <value>60000</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name> <value>false</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.group</name> <value>hadoop</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name> <value>false</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-aggregation.compression-type</name> <value>gz</value> </property> <property> <name>yarn.nodemanager.log-aggregation.debug-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.log-aggregation.num-log-files-per-app</name> <value>30</value> </property> <property> <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name> <value>3600</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/hadoop/yarn/log</value> </property> <property> <name>yarn.nodemanager.log.retain-seconds</name> <value>604800</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.recovery.dir</name> <value>/var/log/hadoop-yarn/nodemanager/recovery-state</value> </property> <property> <name>yarn.nodemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.nodemanager.recovery.supervised</name> <value>true</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/app-logs</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir-suffix</name> <value>logs</value> </property> <property> <name>yarn.nodemanager.resource-plugins</name> <value></value> </property> <property> <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name> <value>auto</value> </property> <property> <name>yarn.nodemanager.resource-plugins.gpu.docker-plugin</name> <value>nvidia-docker-v1</value> </property> <property> <name>yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidiadocker- v1.endpoint</name> <value> http://localhost:3476/v1.0/docker/cli </value> </property> <property> <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name> <value></value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>6</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>12288</value> </property> <property> <name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name> <value>80</value> </property> <property> <name>yarn.nodemanager.runtime.linux.allowed-runtimes</name> <value>default,docker</value> </property> <property> <name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name> <value>host,none,bridge</value> </property> <property> <name>yarn.nodemanager.runtime.linux.docker.capabilities</name> <value> CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP, SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE</value> </property> <property> <name>yarn.nodemanager.runtime.linux.docker.default-container-network</name> <value>host</value> </property> <property> <name>yarn.nodemanager.runtime.linux.docker.privileged-containers.acl</name> <value></value> </property> <property> <name>yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property> <property> <name>yarn.nodemanager.webapp.cross-origin.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>xxxxxxx:8050</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>xxxxxx:8141</value> </property> <property> <name>yarn.resourcemanager.am.max-attempts</name> <value>2</value> </property> <property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value> </property> <property> <name>yarn.resourcemanager.connect.max-wait.ms</name> <value>900000</value> </property> <property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>30000</value> </property> <property> <name>yarn.resourcemanager.fs.state-store.retry-policy-spec</name> <value>2000, 500</value> </property> <property> <name>yarn.resourcemanager.fs.state-store.uri</name> <value> </value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>false</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>xxxxxxxx</value> </property> <property> <name>yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval</name> <value>15000</value> </property> <property> <name>yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor</name> <value>1</value> </property> <property> <name>yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round</name> <value>0.25</value> </property> <property> <name>yarn.resourcemanager.nodes.exclude-path</name> <value>/etc/hadoop/conf/yarn.exclude</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>xxxxxxx:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>xxxxxxxx:8030</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> </property> <property> <name>yarn.resourcemanager.scheduler.monitor.enable</name> <value>false</value> </property> <property> <name>yarn.resourcemanager.state-store.max-completed-applications</name> <value>${yarn.resourcemanager.max-completed-applications}</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size</name> <value>10</value> </property> <property> <name>yarn.resourcemanager.system-metrics-publisher.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>xxxxxx:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.cross-origin.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled</name> <value>false</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address</name> <value>wxxxxxx:8090</value> </property> <property> <name>yarn.resourcemanager.work-preserving-recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name> <value>10000</value> </property> <property> <name>yarn.resourcemanager.zk-acl</name> <value>world:anyone:rwcda</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>xxxxxx:2181,xxxxxx:2181,xxxxxx:2181</value> </property> <property> <name>yarn.resourcemanager.zk-num-retries</name> <value>1000</value> </property> <property> <name>yarn.resourcemanager.zk-retry-interval-ms</name> <value>1000</value> </property> <property> <name>yarn.resourcemanager.zk-state-store.parent-path</name> <value>/rmstore</value> </property> <property> <name>yarn.resourcemanager.zk-timeout-ms</name> <value>10000</value> </property> <property> <name>yarn.rm.system-metricspublisher.emit-container-events</name> <value>true</value> </property> <property> <name>yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled</name> <value>false</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>12288</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>6</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>64</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.service.framework.path</name> <value>/yarn/service-dep.tar.gz</value> </property> <property> <name>yarn.system-metricspublisher.enabled</name> <value>true</value> </property> <property> <name>yarn.timeline-service.address</name> <value>xxxxxx:10200</value> </property> <property> <name>yarn.timeline-service.bind-host</name> <value>0.0.0.0</value> </property> <property> <name>yarn.timeline-service.client.max-retries</name> <value>30</value> </property> <property> <name>yarn.timeline-service.client.retry-interval-ms</name> <value>1000</value> </property> <property> <name>yarn.timeline-service.enabled</name> <value>true</value> </property> <property> <name>yarn.timeline-service.entity-group-fs-store.active-dir</name> <value>/ats/active/</value> </property> <property> <name>yarn.timeline-service.entity-group-fs-store.app-cache-size</name> <value>10</value> </property> <property> <name>yarn.timeline-service.entity-group-fs-store.cleaner-interval-seconds</name> <value>3600</value> </property> <property> <name>yarn.timeline-service.entity-group-fs-store.done-dir</name> <value>/ats/done/</value> </property> <property> <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes</name> <value></value> </property> <property> <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name> <value></value> </property> <property> <name>yarn.timeline-service.entity-group-fs-store.retain-seconds</name> <value>604800</value> </property> <property> <name>yarn.timeline-service.entity-group-fs-store.scan-interval-seconds</name> <value>60</value> </property> <property> <name>yarn.timeline-service.entity-group-fs-store.summary-store</name> <value>org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore</value> </property> <property> <name>yarn.timeline-service.generic-application-history.store-class</name> <value>org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore</value> </property> <property> <name>yarn.timeline-service.hbase-schema.prefix</name> <value>prod.</value> </property> <property> <name>yarn.timeline-service.hbase.configuration.file</name> <value> file:///etc/yarn-hbase/conf/hbase-site.xml </value> </property> <property> <name>yarn.timeline-service.hbase.coprocessor.jar.hdfs.location</name> <value> file:///hadoop-yarn-client/timelineservice/hadoop-yarn-server-timelineservice-hbase-coprocessor.jar </value> </property> <property> <name>yarn.timeline-service.http-authentication.simple.anonymous.allowed</name> <value>true</value> </property> <property> <name>yarn.timeline-service.http-authentication.type</name> <value>simple</value> </property> <property> <name>yarn.timeline-service.http-cross-origin.enabled</name> <value>true</value> </property> <property> <name>yarn.timeline-service.leveldb-state-store.path</name> <value>/hadoop/yarn/timeline</value> </property> <property> <name>yarn.timeline-service.leveldb-timeline-store.path</name> <value>/hadoop/yarn/timeline</value> </property> <property> <name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name> <value>104857600</value> </property> <property> <name>yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size</name> <value>10000</value> </property> <property> <name>yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size</name> <value>10000</value> </property> <property> <name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name> <value>300000</value> </property> <property> <name>yarn.timeline-service.reader.webapp.address</name> <value>xxxxxx:8198</value> </property> <property> <name>yarn.timeline-service.reader.webapp.https.address</name> <value>xxxxxx:8199</value> </property> <property> <name>yarn.timeline-service.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.timeline-service.state-store-class</name> <value>org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore</value> </property> <property> <name>yarn.timeline-service.store-class</name> <value>org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore</value> </property> <property> <name>yarn.timeline-service.ttl-enable</name> <value>true</value> </property> <property> <name>yarn.timeline-service.ttl-ms</name> <value>2678400000</value> </property> <property> <name>yarn.timeline-service.version</name> <value>2.0</value> </property> <property> <name>yarn.timeline-service.versions</name> <value>1.5f,2.0f</value> </property> <property> <name>yarn.timeline-service.webapp.address</name> <value>xxxxxx:8188</value> </property> <property> <name>yarn.timeline-service.webapp.https.address</name> <value>xxxxxx:8190</value> </property> <property> <name>yarn.webapp.api-service.enable</name> <value>true</value> </property> <property> <name>yarn.webapp.ui2.enable</name> <value>true</value> </property> </configuration>
Description
Hi, I am running testcases on Yarn 2.6 and Yarn 3.0 and found out the performance seems like twice slower on Yarn 3.0, and the performance would get even slower if we acquire more containers. I looked at the node manager logs on 2.6 vs 3.0. Here is what I find below.
On 2.6 , this is a life cycle of a specific container, from beginning to end, it takes about 8 seconds (9:53:50 to 9:53:58).
On 3.0: the life cycle of a specific container looks like this, it takes 20 seconds to finish the same job. (9:51:44 to 9:52:04)
It seems like on 3.0, it spends an extra 5 seconds on monitor.ContinaerMonitorImpl (marked in red) which doesn't happen in 2.6, and also after the job is done, and the container is exiting, on 3.0, it took 5 seconds to do that (9:51:59 to 9:52:04) which on 2.6, it only took less than 1/.2 of the time. (9: 53:56 to 9:53:58).
Since we are running the same unit testcases and usually acquire more than 4 containers, therefore, when it addess up all these extra seconds, it became a huge performance issue. On 2.6, the unittest runs 7 hours whilc on 3.0, the same unitests runs 11 hours. I was told this performance delay might be caused by Hadoop’s new monitoring system Timeline service v2. Could someone take a look of this? Thanks for any help on this!!
Attachments
Attachments
Issue Links
- is caused by
-
YARN-5366 Improve handling of the Docker container life cycle
- Resolved